summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-08-23xen/p2m: Add logic to revector a P2M tree to use __va leafs.Konrad Rzeszutek Wilk
During bootup Xen supplies us with a P2M array. It sticks it right after the ramdisk, as can be seen with a 128GB PV guest: (certain parts removed for clarity): xc_dom_build_image: called xc_dom_alloc_segment: kernel : 0xffffffff81000000 -> 0xffffffff81e43000 (pfn 0x1000 + 0xe43 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000 xc_dom_alloc_segment: ramdisk : 0xffffffff81e43000 -> 0xffffffff925c7000 (pfn 0x1e43 + 0x10784 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000 xc_dom_alloc_segment: phys2mach : 0xffffffff925c7000 -> 0xffffffffa25c7000 (pfn 0x125c7 + 0x10000 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000 xc_dom_alloc_page : start info : 0xffffffffa25c7000 (pfn 0x225c7) xc_dom_alloc_page : xenstore : 0xffffffffa25c8000 (pfn 0x225c8) xc_dom_alloc_page : console : 0xffffffffa25c9000 (pfn 0x225c9) nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s) nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s) nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s) nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s) xc_dom_alloc_segment: page tables : 0xffffffffa25ca000 -> 0xffffffffa26e1000 (pfn 0x225ca + 0x117 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000 xc_dom_alloc_page : boot stack : 0xffffffffa26e1000 (pfn 0x226e1) xc_dom_build_image : virt_alloc_end : 0xffffffffa26e2000 xc_dom_build_image : virt_pgtab_end : 0xffffffffa2800000 So the physical memory and virtual (using __START_KERNEL_map addresses) layout looks as so: phys __ka /------------\ /-------------------\ | 0 | empty | 0xffffffff80000000| | .. | | .. | | 16MB | <= kernel starts | 0xffffffff81000000| | .. | | | | 30MB | <= kernel ends => | 0xffffffff81e43000| | .. | & ramdisk starts | .. | | 293MB | <= ramdisk ends=> | 0xffffffff925c7000| | .. | & P2M starts | .. | | .. | | .. | | 549MB | <= P2M ends => | 0xffffffffa25c7000| | .. | start_info | 0xffffffffa25c7000| | .. | xenstore | 0xffffffffa25c8000| | .. | cosole | 0xffffffffa25c9000| | 549MB | <= page tables => | 0xffffffffa25ca000| | .. | | | | 550MB | <= PGT end => | 0xffffffffa26e1000| | .. | boot stack | | \------------/ \-------------------/ As can be seen, the ramdisk, P2M and pagetables are taking a bit of __ka addresses space. Which is a problem since the MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits right in there! This results during bootup with the inability to load modules, with this error: ------------[ cut here ]------------ WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370() Call Trace: [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0 [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20 [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370 [<ffffffff81130c4d>] map_vm_area+0x2d/0x50 [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80 [<ffffffff810c6186>] ? load_module+0x66/0x19c0 [<ffffffff8105cadc>] module_alloc+0x5c/0x60 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80 [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80 [<ffffffff810c70c3>] load_module+0xfa3/0x19c0 [<ffffffff812491f6>] ? security_file_permission+0x86/0x90 [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220 [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b ---[ end trace fd8f7704fdea0291 ]--- vmalloc: allocation failure, allocated 16384 of 20480 bytes modprobe: page allocation failure: order:0, mode:0xd2 Since the __va and __ka are 1:1 up to MODULES_VADDR and cleanup_highmap rids __ka of the ramdisk mapping, what we want to do is similar - get rid of the P2M in the __ka address space. There are two ways of fixing this: 1) All P2M lookups instead of using the __ka address would use the __va address. This means we can safely erase from __ka space the PMD pointers that point to the PFNs for P2M array and be OK. 2). Allocate a new array, copy the existing P2M into it, revector the P2M tree to use that, and return the old P2M to the memory allocate. This has the advantage that it sets the stage for using XEN_ELF_NOTE_INIT_P2M feature. That feature allows us to set the exact virtual address space we want for the P2M - and allows us to boot as initial domain on large machines. So we pick option 2). This patch only lays the groundwork in the P2M code. The patch that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree." Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: Recycle the Xen provided L4, L3, and L2 pagesKonrad Rzeszutek Wilk
As we are not using them. We end up only using the L1 pagetables and grafting those to our page-tables. [v1: Per Stefano's suggestion squashed two commits] [v2: Per Stefano's suggestion simplified loop] [v3: Fix smatch warnings] [v4: Add more comments] Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: For 64-bit do not call xen_map_identity_earlyKonrad Rzeszutek Wilk
B/c we do not need it. During the startup the Xen provides us with all the initial memory mapped that we need to function. The initial memory mapped is up to the bootstack, which means we can reference using __ka up to 4.f): (from xen/interface/xen.h): 4. This the order of bootstrap elements in the initial virtual region: a. relocated kernel image b. initial ram disk [mod_start, mod_len] c. list of allocated page frames [mfn_list, nr_pages] d. start_info_t structure [register ESI (x86)] e. bootstrap page tables [pt_base, CR3 (x86)] f. bootstrap stack [register ESP (x86)] (initial ram disk may be ommitted). [v1: More comments in git commit] Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: use copy_page instead of memcpy.Konrad Rzeszutek Wilk
After all, this is what it is there for. Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: Provide comments describing the _ka and _va aliasing issueKonrad Rzeszutek Wilk
Which is that the level2_kernel_pgt (__ka virtual addresses) and level2_ident_pgt (__va virtual address) contain the same PMD entries. So if you modify a PTE in __ka, it will be reflected in __va (and vice-versa). Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: The xen_setup_kernel_pagetable doesn't need to return anything.Konrad Rzeszutek Wilk
We don't need to return the new PGD - as we do not use it. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." ↵Konrad Rzeszutek Wilk
and "xen/x86: Use memblock_reserve for sensitive areas." This reverts commit 806c312e50f122c47913145cf884f53dd09d9199 and commit 59b294403e9814e7c1154043567f0d71bac7a511. And also documents setup.c and why we want to do it that way, which is that we tried to make the the memblock_reserve more selective so that it would be clear what region is reserved. Sadly we ran in the problem wherein on a 64-bit hypervisor with a 32-bit initial domain, the pt_base has the cr3 value which is not neccessarily where the pagetable starts! As Jan put it: " Actually, the adjustment turns out to be correct: The page tables for a 32-on-64 dom0 get allocated in the order "first L1", "first L2", "first L3", so the offset to the page table base is indeed 2. When reading xen/include/public/xen.h's comment very strictly, this is not a violation (since there nothing is said that the first thing in the page table space is pointed to by pt_base; I admit that this seems to be implied though, namely do I think that it is implied that the page table space is the range [pt_base, pt_base + nt_pt_frames), whereas that range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames), which - without a priori knowledge - the kernel would have difficulty to figure out)." - so lets just fall back to the easy way and reserve the whole region. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-21xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain.Konrad Rzeszutek Wilk
If a 64-bit hypervisor is booted with a 32-bit initial domain, the hypervisor deals with the initial domain as "compat" and does some extra adjustments (like pagetables are 4 bytes instead of 8). It also adjusts the xen_start_info->pt_base incorrectly. When booted with a 32-bit hypervisor (32-bit initial domain): .. (XEN) Start info: cf831000->cf83147c (XEN) Page tables: cf832000->cf8b5000 .. [ 0.000000] PT: cf832000 (f832000) [ 0.000000] Reserving PT: f832000->f8b5000 And with a 64-bit hypervisor: (XEN) Start info: 00000000cf831000->00000000cf8314b4 (XEN) Page tables: 00000000cf832000->00000000cf8b6000 [ 0.000000] PT: cf834000 (f834000) [ 0.000000] Reserving PT: f834000->f8b8000 To deal with this, we keep keep track of the highest physical address we have reserved via memblock_reserve. If that address does not overlap with pt_base, we have a gap which we reserve. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-21xen/x86: Use memblock_reserve for sensitive areas.Konrad Rzeszutek Wilk
instead of a big memblock_reserve. This way we can be more selective in freeing regions (and it also makes it easier to understand where is what). [v1: Move the auto_translate_physmap to proper line] [v2: Per Stefano suggestion add more comments] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-21xen/p2m: Fix the comment describing the P2M tree.Konrad Rzeszutek Wilk
It mixed up the p2m_mid_missing with p2m_missing. Also remove some extra spaces. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-21Linux 3.5v3.5Linus Torvalds
2012-07-21Remove SYSTEM_SUSPEND_DISK system stateRafael J. Wysocki
The SYSTEM_SUSPEND_DISK system state is never used, so drop it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-21Merge branch 'anton-kgdb' (kgdb dmesg fixups)Linus Torvalds
Merge emailed kgdb dmesg fixups patches from Anton Vorontsov: "The dmesg command appears to be broken after the printk rework. The old logic in the kdb code makes no sense in terms of current printk/logging storage format, and KDB simply hangs forever upon entering 'dmesg' command. The first patch revives the command by switching to kmsg_dumper iterator. As a side-effect, the code is now much more simpler. A few changes were needed in the printk.c: we needed unlocked variant of the kmsg_dumper iterator, but these can surely wait for 3.6. It's probably too late even for the first patch to go to 3.5, but I'll try to convince otherwise. :-) Here we go: - The current code is broken for sure, and has no hope to work at all. It is a regression - The new code works for me, and probably works for everyone else; - If it compiles (and I urge everyone to compile-test it on your setup), it hardly can make things worse." * Merge emailed patches from Anton Vorontsov: (4 commits) kdb: Switch to nolock variants of kmsg_dump functions printk: Implement some unlocked kmsg_dump functions printk: Remove kdb_syslog_data kdb: Revive dmesg command
2012-07-21kdb: Switch to nolock variants of kmsg_dump functionsAnton Vorontsov
The locked variants are prone to deadlocks (suppose we got to the debugger w/ the logbuf lock held), so let's switch to nolock variants. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-21printk: Implement some unlocked kmsg_dump functionsAnton Vorontsov
If used from KDB, the locked variants are prone to deadlocks (suppose we got to the debugger w/ the logbuf lock held). So, we have to implement a few routines that grab no logbuf lock. Yet we don't need these functions in modules, so we don't export them. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-21printk: Remove kdb_syslog_dataAnton Vorontsov
The function is no longer needed, so remove it. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-21kdb: Revive dmesg commandAnton Vorontsov
The kgdb dmesg command is broken after the printk rework. The old logic in kdb code makes no sense in terms of current printk/logging storage format, and KDB simply hangs forever. This patch revives the command by switching to kmsg_dumper iterator. The code is now much more simpler and shorter. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-20Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds
Pull late MIPS fixes from Ralf Baechle: "This fixes a number of lose ends in the MIPS code and various bug fixes. Aside of dropping some patch that should not be in this pull request everything has sat in -next for quite a while and there are no known issues. The biggest patch in this patch set moves the allocation of an array that is aliased to a function (for runtime generated code) to assembler code. This avoids an issue with certain toolchains when building for microMIPS." * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (35 commits) MIPS: PCI: Move fixups from __init to __devinit. MIPS: Fix bug.h MIPS build regression MIPS: sync-r4k: remove redundant irq operation MIPS: smp: Warn on too early irq enable MIPS: call set_cpu_online() on cpu being brought up with irq disabled MIPS: call ->smp_finish() a little late MIPS: Yosemite: delay irq enable to ->smp_finish() MIPS: SMTC: delay irq enable to ->smp_finish() MIPS: BMIPS: delay irq enable to ->smp_finish() MIPS: Octeon: delay enable irq to ->smp_finish() MIPS: Oprofile: Fix build as a module. MIPS: BCM63XX: Fix BCM6368 IPSec clock bit MIPS: perf: Fix build error caused by unused counters_per_cpu_to_total() MIPS: Fix Magic SysRq L kernel crash. MIPS: BMIPS: Fix duplicate header inclusion. mips: mark const init data with __initconst instead of __initdata MIPS: cmpxchg.h: Add missing include MIPS: Malta may also be equipped with MIPS64 R2 processors. MIPS: Fix typo multipy -> multiply MIPS: Cavium: Fix duplicate ARCH_SPARSEMEM_ENABLE in kconfig. ...
2012-07-20Merge tag 'dm-3.5-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull device-mapper discard fixes from Alasdair G Kergon: - avoid a crash in dm-raid1 when discards coincide with mirror recovery; - avoid discarding shared data that's still needed in dm-thin; - don't guarantee that discarded blocks will be wiped in dm-raid1. * tag 'dm-3.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: dm raid1: set discard_zeroes_data_unsupported dm thin: do not send discards to shared blocks dm raid1: fix crash with mirror recovery and discard
2012-07-20Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osdLinus Torvalds
Pull pnfs/ore fixes from Boaz Harrosh: "These are catastrophic fixes to the pnfs objects-layout that were just discovered. They are also destined for @stable. I have found these and worked on them at around RC1 time but unfortunately went to the hospital for kidney stones and had a very slow recovery. I refrained from sending them as is, before proper testing, and surly I have found a bug just yesterday. So now they are all well tested, and have my sign-off. Other then fixing the problem at hand, and assuming there are no bugs at the new code, there is low risk to any surrounding code. And in anyway they affect only these paths that are now broken. That is RAID5 in pnfs objects-layout code. It does also affect exofs (which was not broken) but I have tested exofs and it is lower priority then objects-layout because no one is using exofs, but objects-layout has lots of users." * 'for-linus' of git://git.open-osd.org/linux-open-osd: pnfs-obj: Fix __r4w_get_page when offset is beyond i_size pnfs-obj: don't leak objio_state if ore_write/read fails ore: Unlock r4w pages in exact reverse order of locking ore: Remove support of partial IO request (NFS crash) ore: Fix NFS crash by supporting any unaligned RAID IO
2012-07-20Merge tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBIFS free space fix-up bugfix from Artem Bityutskiy: "It's been reported already twice recently: http://lists.infradead.org/pipermail/linux-mtd/2012-May/041408.html http://lists.infradead.org/pipermail/linux-mtd/2012-June/042422.html and we finally have the fix. I am quite confident the fix is correct because I could reproduce the problem with nandsim and verify the fix. It was also verified by Iwo (the reporter). I am also confident that this is OK to merge the fix so late because this patch affects only the fixup functionality, which is not used by most users." * tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs: UBIFS: fix a bug in empty space fix-up
2012-07-20dm raid1: set discard_zeroes_data_unsupportedMikulas Patocka
We can't guarantee that REQ_DISCARD on dm-mirror zeroes the data even if the underlying disks support zero on discard. So this patch sets ti->discard_zeroes_data_unsupported. For example, if the mirror is in the process of resynchronizing, it may happen that kcopyd reads a piece of data, then discard is sent on the same area and then kcopyd writes the piece of data to another leg. Consequently, the data is not zeroed. The flag was made available by commit 983c7db347db8ce2d8453fd1d89b7a4bb6920d56 (dm crypt: always disable discard_zeroes_data). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-20dm thin: do not send discards to shared blocksMikulas Patocka
When process_discard receives a partial discard that doesn't cover a full block, it sends this discard down to that block. Unfortunately, the block can be shared and the discard would corrupt the other snapshots sharing this block. This patch detects block sharing and ends the discard with success when sending it to the shared block. The above change means that if the device supports discard it can't be guaranteed that a discard request zeroes data. Therefore, we set ti->discard_zeroes_data_unsupported. Thin target discard support with this bug arrived in commit 104655fd4dcebd50068ef30253a001da72e3a081 (dm thin: support discards). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-20dm raid1: fix crash with mirror recovery and discardMikulas Patocka
This patch fixes a crash when a discard request is sent during mirror recovery. Firstly, some background. Generally, the following sequence happens during mirror synchronization: - function do_recovery is called - do_recovery calls dm_rh_recovery_prepare - dm_rh_recovery_prepare uses a semaphore to limit the number simultaneously recovered regions (by default the semaphore value is 1, so only one region at a time is recovered) - dm_rh_recovery_prepare calls __rh_recovery_prepare, __rh_recovery_prepare asks the log driver for the next region to recover. Then, it sets the region state to DM_RH_RECOVERING. If there are no pending I/Os on this region, the region is added to quiesced_regions list. If there are pending I/Os, the region is not added to any list. It is added to the quiesced_regions list later (by dm_rh_dec function) when all I/Os finish. - when the region is on quiesced_regions list, there are no I/Os in flight on this region. The region is popped from the list in dm_rh_recovery_start function. Then, a kcopyd job is started in the recover function. - when the kcopyd job finishes, recovery_complete is called. It calls dm_rh_recovery_end. dm_rh_recovery_end adds the region to recovered_regions or failed_recovered_regions list (depending on whether the copy operation was successful or not). The above mechanism assumes that if the region is in DM_RH_RECOVERING state, no new I/Os are started on this region. When I/O is started, dm_rh_inc_pending is called, which increases reg->pending count. When I/O is finished, dm_rh_dec is called. It decreases reg->pending count. If the count is zero and the region was in DM_RH_RECOVERING state, dm_rh_dec adds it to the quiesced_regions list. Consequently, if we call dm_rh_inc_pending/dm_rh_dec while the region is in DM_RH_RECOVERING state, it could be added to quiesced_regions list multiple times or it could be added to this list when kcopyd is copying data (it is assumed that the region is not on any list while kcopyd does its jobs). This results in memory corruption and crash. There already exist bypasses for REQ_FLUSH requests: REQ_FLUSH requests do not belong to any region, so they are always added to the sync list in do_writes. dm_rh_inc_pending does not increase count for REQ_FLUSH requests. In mirror_end_io, dm_rh_dec is never called for REQ_FLUSH requests. These bypasses avoid the crash possibility described above. These bypasses were improperly implemented for REQ_DISCARD when the mirror target gained discard support in commit 5fc2ffeabb9ee0fc0e71ff16b49f34f0ed3d05b4 (dm raid1: support discard). In do_writes, REQ_DISCARD requests is always added to the sync queue and immediately dispatched (even if the region is in DM_RH_RECOVERING). However, dm_rh_inc and dm_rh_dec is called for REQ_DISCARD resusts. So it violates the rule that no I/Os are started on DM_RH_RECOVERING regions, and causes the list corruption described above. This patch changes it so that REQ_DISCARD requests follow the same path as REQ_FLUSH. This avoids the crash. Reference: https://bugzilla.redhat.com/837607 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-20pnfs-obj: Fix __r4w_get_page when offset is beyond i_sizeBoaz Harrosh
It is very common for the end of the file to be unaligned on stripe size. But since we know it's beyond file's end then the XOR should be preformed with all zeros. Old code used to just read zeros out of the OSD devices, which is a great waist. But what scares me more about this situation is that, we now have pages attached to the file's mapping that are beyond i_size. I don't like the kind of bugs this calls for. Fix both birds, by returning a global zero_page, if offset is beyond i_size. TODO: Change the API to ->__r4w_get_page() so a NULL can be returned without being considered as error, since XOR API treats NULL entries as zero_pages. [Bug since 3.2. Should apply the same way to all Kernels since] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-20pnfs-obj: don't leak objio_state if ore_write/read failsBoaz Harrosh
[Bug since 3.2 Kernel] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-20ore: Unlock r4w pages in exact reverse order of lockingBoaz Harrosh
The read-4-write pages are locked in address ascending order. But where unlocked in a way easiest for coding. Fix that, locks should be released in opposite order of locking, .i.e descending address order. I have not hit this dead-lock. It was found by inspecting the dbug print-outs. I suspect there is an higher lock at caller that protects us, but fix it regardless. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-20ore: Remove support of partial IO request (NFS crash)Boaz Harrosh
Do to OOM situations the ore might fail to allocate all resources needed for IO of the full request. If some progress was possible it would proceed with a partial/short request, for the sake of forward progress. Since this crashes NFS-core and exofs is just fine without it just remove this contraption, and fail. TODO: Support real forward progress with some reserved allocations of resources, such as mem pools and/or bio_sets [Bug since 3.2 Kernel] CC: Stable Tree <stable@kernel.org> CC: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-20ore: Fix NFS crash by supporting any unaligned RAID IOBoaz Harrosh
In RAID_5/6 We used to not permit an IO that it's end byte is not stripe_size aligned and spans more than one stripe. .i.e the caller must check if after submission the actual transferred bytes is shorter, and would need to resubmit a new IO with the remainder. Exofs supports this, and NFS was supposed to support this as well with it's short write mechanism. But late testing has exposed a CRASH when this is used with none-RPC layout-drivers. The change at NFS is deep and risky, in it's place the fix at ORE to lift the limitation is actually clean and simple. So here it is below. The principal here is that in the case of unaligned IO on both ends, beginning and end, we will send two read requests one like old code, before the calculation of the first stripe, and also a new site, before the calculation of the last stripe. If any "boundary" is aligned or the complete IO is within a single stripe. we do a single read like before. The code is clean and simple by splitting the old _read_4_write into 3 even parts: 1._read_4_write_first_stripe 2. _read_4_write_last_stripe 3. _read_4_write_execute And calling 1+3 at the same place as before. 2+3 before last stripe, and in the case of all in a single stripe then 1+2+3 is preformed additively. Why did I not think of it before. Well I had a strike of genius because I have stared at this code for 2 years, and did not find this simple solution, til today. Not that I did not try. This solution is much better for NFS than the previous supposedly solution because the short write was dealt with out-of-band after IO_done, which would cause for a seeky IO pattern where as in here we execute in order. At both solutions we do 2 separate reads, only here we do it within a single IO request. (And actually combine two writes into a single submission) NFS/exofs code need not change since the ORE API communicates the new shorter length on return, what will happen is that this case would not occur anymore. hurray!! [Stable this is an NFS bug since 3.2 Kernel should apply cleanly] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-20UBIFS: fix a bug in empty space fix-upArtem Bityutskiy
UBIFS has a feature called "empty space fix-up" which is a quirk to work-around limitations of dumb flasher programs. Namely, of those flashers that are unable to skip NAND pages full of 0xFFs while flashing, resulting in empty space at the end of half-filled eraseblocks to be unusable for UBIFS. This feature is relatively new (introduced in v3.0). The fix-up routine (fixup_free_space()) is executed only once at the very first mount if the superblock has the 'space_fixup' flag set (can be done with -F option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and writes it back to the same LEB. The routine assumes the image is pristine and does not have anything in the journal. There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly. All but one LEB of the log of a pristine file-system are empty. And one contains just a commit start node. And 'fixup_free_space()' just unmapped this LEB, which resulted in wiping the commit start node. As a result, some users were unable to mount the file-system next time with the following symptom: UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0 The root-cause of this bug was that 'fixup_free_space()' wrongly assumed that the beginning of empty space in the log head (c->lhead_offs) was known on mount. However, it is not the case - it was always 0. UBIFS does not store in it the master node and finds out by scanning the log on every mount. The fix is simple - just pass commit start node size instead of 0 to 'fixup_leb()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com> Cc: stable@vger.kernel.org [v3.0+] Reported-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com> Tested-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com> Reported-by: James Nute <newten82@gmail.com>
2012-07-19Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull last minute Ceph fixes from Sage Weil: "The important one fixes a bug in the socket failure handling behavior that was turned up in some recent failure injection testing. The other two are minor bug fixes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: endian bug in rbd_req_cb() rbd: Fix ceph_snap_context size calculation libceph: fix messenger retry
2012-07-19Merge tag 'md-3.5-fixes' of git://neil.brown.name/mdLinus Torvalds
Pull three md bugfixes from NeilBrown: "One of the bugs was introduced in 3.5-rc1. Others have been there for longer." * tag 'md-3.5-fixes' of git://neil.brown.name/md: md/raid1: close some possible races on write errors during resync md: avoid crash when stopping md array races with closing other open fds. md: fix bug in handling of new_data_offset
2012-07-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking changes from David Miller: "Ok, we should be good to go now" 1) We have to statically initialize the init_net device list head rather than do so in an initcall, otherwise netprio_cgroup crashes if it's built statically rather than modular (Mark D. Rustad) 2) Fix SKB null oopser in CIPSO ipv4 option processing (Paul Moore) 3) Qlogic maintainers update (Anirban Chakraborty) * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: Statically initialize init_net.dev_base_head MAINTAINERS: Changes in qlcnic and qlge maintainers list cipso: don't follow a NULL pointer when setsockopt() is called
2012-07-19Merge branch 'upstream-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID update from Jiri Kosina: "A final round of changes for HID for 3.5: just device ID additions." * 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: hid-multitouch: add support for Zytronic panels HID: add Sennheiser BTD500USB device support HID: add battery quirk for Apple Wireless ANSI
2012-07-19cx25821: Remove bad strcpy to read-only char*Ezequiel Garcia
The strcpy was being used to set the name of the board. Since the destination char* was read-only and the name is set statically at compile time; this was both wrong and redundant. The type of char* is changed to const char* to prevent future errors. Reported-by: Radek Masin <radek@masin.eu> Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com> [ Taking directly due to vacations - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-19HID: hid-multitouch: add support for Zytronic panelsBenjamin Tissoires
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@enac.fr> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-07-19MIPS: PCI: Move fixups from __init to __devinit.Sebastian Andrzej Siewior
Fixups are executed once the pci-device is found which is during boot process so __init seems fine as long as the platform does not support hotplug. However it is possible to remove the PCI bus at run time and have it rediscovered again via "echo 1 > /sys/bus/pci/rescan" and this will call the fixups again. [ralf@linux-mips.org: Made piixirqmap[] in malta_piix_func0_fixup() __initdata.] Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: linux-mips@linux-mips.org Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-19MIPS: Fix bug.h MIPS build regressionYoichi Yuasa
Commit: 3777808873b0c49c5cf27e44c948dfb02675d578 [bug.h: need linux/kernel.h for TAINT_WARN.] breaks all MIPS builds. CC arch/mips/kernel/machine_kexec.o In file included from include/linux/kernel.h:20:0, from include/asm-generic/bug.h:35, from /home/yuasa/src/linux/kernel/git/linux-2.6/arch/mips/include/asm/bug.h:41, from /home/yuasa/src/linux/kernel/git/linux-2.6/arch/mips/include/asm/bitops.h:20, from include/linux/bitops.h:22, from include/linux/signal.h:38, from include/linux/elfcore.h:5, from include/linux/kexec.h:60, from arch/mips/kernel/machine_kexec.c:9: include/linux/log2.h: In function '__ilog2_u32': include/linux/log2.h:34:2: error: implicit declaration of function 'fls' [-Werror=implicit-function-declaration] include/linux/log2.h: In function '__ilog2_u64': include/linux/log2.h:42:2: error: implicit declaration of function 'fls64' [-Werror=implicit-function-declaration] include/linux/log2.h: In function '__roundup_pow_of_two': include/linux/log2.h:63:2: error: implicit declaration of function 'fls_long' [-Werror=implicit-function-declaration] In file included from include/linux/bitops.h:22:0, from include/linux/signal.h:38, from include/linux/elfcore.h:5, from include/linux/kexec.h:60, from arch/mips/kernel/machine_kexec.c:9: /home/yuasa/src/linux/kernel/git/linux-2.6/arch/mips/include/asm/bitops.h: At top level: /home/yuasa/src/linux/kernel/git/linux-2.6/arch/mips/include/asm/bitops.h:615:19: error: static declaration of 'fls' follows non-static declaration include/linux/log2.h:34:9: note: previous implicit declaration of 'fls' was here In file included from /home/yuasa/src/linux/kernel/git/linux-2.6/arch/mips/include/asm/bitops.h:651:0, from include/linux/bitops.h:22, from include/linux/signal.h:38, from include/linux/elfcore.h:5, from include/linux/kexec.h:60, from arch/mips/kernel/machine_kexec.c:9: include/asm-generic/bitops/fls64.h:18:28: error: static declaration of 'fls64' follows non-static declaration include/linux/log2.h:42:9: note: previous implicit declaration of 'fls64' was here In file included from include/linux/signal.h:38:0, from include/linux/elfcore.h:5, from include/linux/kexec.h:60, from arch/mips/kernel/machine_kexec.c:9: include/linux/bitops.h:160:24: error: conflicting types for 'fls_long' include/linux/log2.h:63:16: note: previous implicit declaration of 'fls_long' was here cc1: all warnings being treated as errors make[2]: *** [arch/mips/kernel/machine_kexec.o] Error 1 Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: yuasa@linux-mips.org Cc: linux-kernel@vger.kernel.org Cc: Linuxppc-dev <linuxppc-dev@ozlabs.org> Cc: Linux MIPS Mailing List <linux-mips@linux-mips.org> Cc: Linux-sh list <linux-sh@vger.kernel.org> Cc: Chris Zankel <chris@zankel.net> Patchwork: https://patchwork.linux-mips.org/patch/4000/ Tested-by: John Crispin <blogic@openwrt.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-19MIPS: sync-r4k: remove redundant irq operationYong Zhang
Since we have delayed irq enabling to ->smp_finish() Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Cc: David Daney <david.daney@cavium.com> Acked-by: David Daney <david.daney@cavium.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-19MIPS: smp: Warn on too early irq enableYong Zhang
Just to catch a potential issue. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Cc: David Daney <david.daney@cavium.com> Acked-by: David Daney <david.daney@cavium.com> Patchwork: https://patchwork.linux-mips.org/patch/3852/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-19MIPS: call set_cpu_online() on cpu being brought up with irq disabledYong Zhang
To prevent a problem as commit 5fbd036b [sched: Cleanup cpu_active madness] and commit 2baab4e9 [sched: Fix select_fallback_rq() vs cpu_active/cpu_online] try to resolve, move set_cpu_online() to the brought up CPU and with irq disabled. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Cc: David Daney <david.daney@cavium.com> Acked-by: David Daney <david.daney@cavium.com> Patchwork: https://patchwork.linux-mips.org/patch/3851/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-19MIPS: call ->smp_finish() a little lateYong Zhang
We have move irq enable to ->smp_finish. Place ->smp_finish() a little late to prepare for move set_cpu_online() into start_secondary. And it's not necessary to call cpu_set(cpu, cpu_callin_map) and synchronise_count_slave() with irq enabled. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Cc: David Daney <david.daney@cavium.com> Acked-by: David Daney <david.daney@cavium.com> Patchwork: https://patchwork.linux-mips.org/patch/3850/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-19MIPS: Yosemite: delay irq enable to ->smp_finish()Yong Zhang
To prepare for smoothing set_cpu_[active|online]() mess up Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Cc: David Daney <david.daney@cavium.com> Acked-by: David Daney <david.daney@cavium.com> Patchwork: https://patchwork.linux-mips.org/patch/3848/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-19MIPS: SMTC: delay irq enable to ->smp_finish()Yong Zhang
To prepare for smoothing set_cpu_[active|online]() mess up Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Cc: David Daney <david.daney@cavium.com> Acked-by: David Daney <david.daney@cavium.com> Patchwork: https://patchwork.linux-mips.org/patch/3847/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-19MIPS: BMIPS: delay irq enable to ->smp_finish()Yong Zhang
To prepare for smoothing set_cpu_[active|online]() mess up Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Cc: David Daney <david.daney@cavium.com> Acked-by: David Daney <david.daney@cavium.com> Patchwork: https://patchwork.linux-mips.org/patch/3846/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-19MIPS: Octeon: delay enable irq to ->smp_finish()Yong Zhang
To prepare for smoothing set_cpu_[active|online]() mess up Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Cc: David Daney <david.daney@cavium.com> Acked-by: David Daney <david.daney@cavium.com> Patchwork: https://patchwork.linux-mips.org/patch/3845/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-19MIPS: Oprofile: Fix build as a module.Ralf Baechle
When building oprofile as a module for R10000 or R7000 class processors, E9000 or MIPSxx class cores since 3572a2c37f667ee49333f8863722b8f43eac506b [MIPS: make oprofile use cp0_perfcount_irq if it is set] an ERROR: "cp0_compare_irq" [arch/mips/oprofile/oprofile.ko] undefined! error will happen. Fixed by exporting cp0_compare_irq. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-19MIPS: BCM63XX: Fix BCM6368 IPSec clock bitFlorian Fainelli
The IPsec clock bit is 18 and not 17. Signed-off-by: Florian Fainelli <florian@openwrt.org> Cc: linux-mips@linux-mips.org Cc: mpm@selenic.com Cc: herbert@gondor.apana.org.au Patchwork: https://patchwork.linux-mips.org/patch/3323/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-19MIPS: perf: Fix build error caused by unused counters_per_cpu_to_total()Florian Fainelli
cc1: warnings being treated as errors arch/mips/kernel/perf_event_mipsxx.c:166: error: 'counters_per_cpu_to_total' defined but not used make[2]: *** [arch/mips/kernel/perf_event_mipsxx.o] Error 1 make[2]: *** Waiting for unfinished jobs.... It was first introduced by 82091564cfd7ab8def42777a9c662dbf655c5d25 [MIPS: perf: Add support for 64-bit perf counters.] in 3.2. Signed-off-by: Florian Fainelli <florian@openwrt.org> Cc: linux-mips@linux-mips.org Cc: david.daney@cavium.com Patchwork: https://patchwork.linux-mips.org/patch/3357/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-19MIPS: Fix Magic SysRq L kernel crash.Vincent Wen
show_backtrace() was passed a NULL pointer which caused paging request fail. Set to current task as other architectures (ARM, etc) do when passed a NULL task pointer. Signed-off-by: Vincent Wen <vincentwenlinux@gmail.com> Cc: linux-mips@linux-mips.org Cc: cernekee@gmail.com Patchwork: https://patchwork.linux-mips.org/patch/3524/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>