linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2012-01-13	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next: Squashfs: fix i_blocks calculation with extended regular files Squashfs: fix mount time sanity check for corrupted superblock Squashfs: optimise squashfs_cache_get entry search Squashfs: Update documentation to include xattrs Squashfs: add missing block release on error condition
2012-01-13	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: Fix nlink setting on inode creation GFS2: fail mount if journal recovery fails GFS2: let spectator mount do read only recovery GFS2: Fix a use-after-free that coverity spotted GFS2: dlm based recovery coordination
2012-01-13	Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6	Linus Torvalds
	* 'linux-next' of git://git.infradead.org/ubifs-2.6: UBIFS: fix key printing UBIFS: use snprintf instead of sprintf when printing keys UBIFS: fix debugging messages UBIFS: make debugging messages light again UBI: fix debugging messages UBI: make vid_hdr non-static
2012-01-13	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: ensure prealloc_blob is in place when removing xattr rbd: initialize snap_rwsem in rbd_add() ceph: enable/disable dentry complete flags via mount option vfs: export symbol d_find_any_alias() ceph: always initialize the dentry in open_root_dentry() libceph: remove useless return value for osd_client __send_request() ceph: avoid iput() while holding spinlock in ceph_dir_fsync ceph: avoid useless dget/dput in encode_fh ceph: dereference pointer after checking for NULL crush: fix force for non-root TAKE ceph: remove unnecessary d_fsdata conditional checks ceph: Use kmemdup rather than duplicating its implementation Fix up conflicts in fs/ceph/super.c (d_alloc_root() failure handling vs always initialize the dentry in open_root_dentry)
2012-01-13	vhost-net: add module alias (v2.1)	stephen hemminger
	By adding some module aliases, programs (or users) won't have to explicitly call modprobe. Vhost-net will always be available if built into the kernel. It does require assigning a permanent minor number for depmod to work. Also: - use C99 style initialization. - add missing entry in documentation for loop-control Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-By: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-13	xfs: remove the unused dm_attrs structure	Christoph Hellwig
	.. and the just as dead bhv_desc forward declaration while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-13	xfs: cleanup xfs_iomap_eof_align_last_fsb	Christoph Hellwig
	Replace the nasty if, else if, elseif condition with more natural C flow that expressed the logic we want here better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-13	xfs: remove xfs_itruncate_data	Christoph Hellwig
	This wrapper isn't overly useful, not to say rather confusing. Around the call to xfs_itruncate_extents it does: - add tracing - add a few asserts in debug builds - conditionally update the inode size in two places - log the inode Both the tracing and the inode logging can be moved to xfs_itruncate_extents as they are useful for the attribute fork as well - in fact the attr code already does an equivalent xfs_trans_log_inode call just after calling xfs_itruncate_extents. The conditional size updates are a mess, and there was no reason to do them in two places anyway, as the first one was conditional on the inode having extents - but without extents we xfs_itruncate_extents would be a no-op and the placement wouldn't matter anyway. Instead move the size assignments and the asserts that make sense to the callers that want it. As a side effect of this clean up xfs_setattr_size by introducing variables for the old and new inode size, and moving the size updates into a common place. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-13	ipv6: release idev when ip6_neigh_lookup failed in icmp6_dst_alloc	RongQing.Li
	release idev when ip6_neigh_lookup failed in icmp6_dst_alloc Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-13	autofs4 - fix deal with autofs4_write races	Ian Kent
	I don't know how I missed this obvious mistake when I reviewed Als' patches, sorry. [ Quoting Al: Grr... Note to self: do git status and git stash show -p before git push. Nothing like "WTF? I'd fixed that braino" feeling ;-/ Al sent the same patch - it got broken in commit d668dc56631d: "autofs4: deal with autofs4_write/autofs4_write races". ] Reported-and-tested-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-13	ARM: Add arm_memblock_steal() to allocate memory away from the kernel	Russell King
	Several platforms are now using the memblock_alloc+memblock_free+ memblock_remove trick to obtain memory which won't be mapped in the kernel's page tables. Most platforms do this (correctly) in the ->reserve callback. However, OMAP has started to call these functions outside of this callback, and this is extremely unsafe - memory will not be unmapped, and could well be given out after memblock is no longer responsible for its management. So, provide arm_memblock_steal() to perform this function, and ensure that it panic()s if it is used inappropriately. Convert everyone over, including OMAP. As a result, OMAP with OMAP4_ERRATA_I688 enabled will panic on boot with this change. Mark this option as BROKEN and make it depend on BROKEN. OMAP needs to be fixed, or 137d105d50 (ARM: OMAP4: Fix errata i688 with MPU interconnect barriers.) reverted until such time it can be fixed correctly. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-01-13	Merge branch 'master' into fixes	Russell King

2012-01-13	UBIFS: fix key printing	Artem Bityutskiy
	Before commit 56e46742e846e4de167dde0e1e1071ace1c882a5 we have had locking around all printing macros and we could use static buffers for creating key strings and printing them. However, now we do not have that locking and we cannot use static buffers. This commit removes the old DBGKEY() macros and introduces few new helper macros for printing debugging messages plus a key at the end. Thankfully, all the messages are already structures in a way that the key is printed in the end. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-01-13	UBIFS: use snprintf instead of sprintf when printing keys	Artem Bityutskiy
	Switch to 'snprintf()' which is more secure and reliable. This is also a preparation to the subsequent key printing fixes. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-01-13	dma-buf: Documentation update for Kconfig select	Sumit Semwal
	As per Linus' comment, dma-buf Kconfig entry shouldn't have an option text, but should be selected by the subsystems that use it. Add this information in the documentation as well. Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-01-13	nouveau: Support Optimus models for vga_switcheroo	Peter Lekensteyn
	Newer nVidia cards with Optimus do not support/use the DSM switching functions. Instead, it require a DSM function to be called prior to bringing a device into D3 state. No other _DSM calls are necessary before/after enabling/disabling a device. Switching between discrete and integrated GPU is not supported by this Optimus _DSM call, therefore return on the switching method. Signed-off-by: Peter Lekensteyn <lekensteyn@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-01-13	nouveau: properly check for _DSM function support	Peter Lekensteyn
	According to the ACPI spec version 4, section 9.14.1, _DSM functions must return a value with the first bit enabled if any DSM functions are supported for the given UUID and revision ID. For a given function index n to be marked supported, bit n must be enabled. Signed-off-by: Peter Lekensteyn <lekensteyn@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-01-13	dma-buf: drop option text so users don't select it.	Dave Airlie
	This is going to be used by other subsystems so they should select it. Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-01-13	radeon: Call pci_clear_master() instead of open-coding it.	Michel Dänzer
	Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-01-13	gma500: Discard modes that don't fit in stolen memory	Alan Cox
	[This fixes a crash on boot if the system is plugged into an HDTV so it's probably appropriate to push even though it didn't make the window. We could be cleverer about this but the simple version seems to be the safe one] From: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> At the moment we cannot allocate more than stolen memory size for framebuffers. To get around that issues we discard modes that doesn't fit. This is a temporary solution until we can freely allocate framebuffer memory. [Currently the framebuffer needs to be linear in kernel space due to limits in the kernel fb layer - AC] Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-01-13	drm: bump DRM_CONNECTOR_MAX_ENCODER from 2 to 3	Ben Skeggs
	There exists at least one NVIDIA GPU (Quadro NVS 300) that has a DMS-59 connector which is capable of supporting DisplayPort, TMDS and VGA on a single connector. We need to bump the allowed encoder limit to support all three configs. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-01-13	drm/radeon/kms: Fix module parameter description format	Jean Delvare
	Module parameter descriptions don't take a trailing \n, otherwise it breaks formatting of modinfo's output. Also add missing space after comma. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: David Airlie <airlied@linux.ie> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-01-13	drm/radeon/kms/ni: fix packet2 handling for VM IB parser	Alex Deucher
	Packet2 is only one dword. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-01-13	ttm/dma: Remove the WARN() which is not useful.	Konrad Rzeszutek Wilk
	. It was useful during development, but now on a production system we can get this (if the user forgot to upload the firmware): [drm] radeon: irq initialized. [drm] GART: num cpu pages 131072, num gpu pages 131072 [drm] radeon: ib pool ready. [drm] Loading SUMO Microcode r600_cp: Failed to load firmware "radeon/SUMO_pfp.bin" atl1c 0000:03:00.0: version 1.0.1.0-NAPI.213057] [drm:evergreen_startup] ERROR Failed to load firmware! radeon 0000:00:01.0: disabling GPU acceleration 88] radeon 0000:00:01.0: ffff8801bb782400 unpin not necessary ------------[ cut here ]------------ WARNING: at /home/konrad/linux-linus/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c:956 ttm_dma_unpopulate+0x79/0x300 [ttm]() Hardware name: System Product Name Modules linked in: e1000e atl1c radeon(+) ahci libahci libata scsi_mod fbcon tileblit font ttm bitblit softcursor drm_kms_helper wmi xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea xenfs xen_privcmd Pid: 1600, comm: modprobe Not tainted 3.2.0-06100-ge343a89 #1 Call Trace: [<ffffffff8108973a>] warn_slowpath_common+0x7a/0xb0 [<ffffffff81089785>] warn_slowpath_null+0x15/0x20 [<ffffffffa0060309>] ttm_dma_unpopulate+0x79/0x300 [ttm] [<ffffffffa01341c0>] radeon_ttm_tt_unpopulate+0x120/0x130 [radeon] [<ffffffffa0056e0c>] ttm_tt_destroy+0x2c/0x70 [ttm] [<ffffffffa0057a4e>] ttm_bo_cleanup_memtype_use+0x3e/0x80 [ttm] [<ffffffffa00595a1>] ttm_bo_release+0x251/0x280 [ttm] [<ffffffffa0059610>] ttm_bo_unref+0x40/0x60 [ttm] [<ffffffffa0134d02>] radeon_bo_unref+0x42/0x80 [radeon] [<ffffffffa0186dfb>] radeon_sa_bo_manager_fini+0x6b/0x80 [radeon] [<ffffffffa0146b8f>] radeon_ib_pool_fini+0x6f/0x90 [radeon] [<ffffffffa014be49>] r100_ib_fini+0x19/0x20 [radeon] [<ffffffffa017b47e>] evergreen_init+0x1ee/0x2d0 [radeon] The big WARN() has nothing to do with the culprit - which is that the firmware was not loaded. So lets remove the WARN() from the TTM DMA code. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-01-13	ARM: 7275/1: LPAE: Check the CPU support for the long descriptor format	Catalin Marinas
	This patch adds a check for the presence of the LPAE feature during the CPU initialisation. If not present, it reports an error when CONFIG_DEBUG_LL is enabled. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-01-13	ARM: 7274/1: NUC900: Rename nuc900-audio platform device to nuc900-ac97	Axel Lin
	This change ensures the platform device name matches nuc900-ac97 platform driver name. Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Wan Zongshun <mcuos.com@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-01-13	ALSA: Don't prompt for CONFIG_SND_COMPRESS_OFFLOAD	Takashi Iwai
	CONFIG_SND_COMPRESS_OFFLOAD is an item to be selected by the dirver just like CONFIG_SND_PCM, and no need to prompt for explicit selection. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-01-13	ALSA: HDA: Use LPIB position fix for Macbook Pro 7,1	David Henningsson
	Several users have reported "choppy" audio under the 3.2 kernel, and that changing position_fix to 1 has resolved their problem. The chip is an nVidia Corporation MCP89 High Definition Audio, [10de:0d94] (rev a2). Cc: stable@kernel.org (v3.2+) BugLink: https://bugs.launchpad.net/bugs/909419 Signed-off-by: David Henningsson <david.henningsson@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-01-13	block: Stop using macro stubs for the bio data integrity calls	Martin K. Petersen
	Replace preprocessor macro stubs with real function declarations to prevent warnings when CONFIG_BLK_DEV_INTEGRITY is disabled. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-01-12	Merge branch 'akpm' (aka "Andrew's patch-bomb, take two")	Linus Torvalds
	Andrew explains: - various misc stuff - Most of the rest of MM: memcg, threaded hugepages, others. - cpumask - kexec - kdump - some direct-io performance tweaking - radix-tree optimisations - new selftests code A note on this: often people will develop a new userspace-visible feature and will develop userspace code to exercise/test that feature. Then they merge the patch and the selftest code dies. Sometimes we paste it into the changelog. Sometimes the code gets thrown into Documentation/(!). This saddens me. So this patch creates a bare-bones framework which will henceforth allow me to ask people to include their test apps in the kernel tree so we can keep them alive. Then when people enhance or fix the feature, I can ask them to update the test app too. The infrastruture is terribly trivial at present - let's see how it evolves. - checkpoint/restart feature work. A note on this: this is a project by various mad Russians to perform c/r mainly from userspace, with various oddball helper code added into the kernel where the need is demonstrated. So rather than some large central lump of code, what we have is little bits and pieces popping up in various places which either expose something new or which permit something which is normally kernel-private to be modified. The overall project is an ongoing thing. I've judged that the size and scope of the thing means that we're more likely to be successful with it if we integrate the support into mainline piecemeal rather than allowing it all to develop out-of-tree. However I'm less confident than the developers that it will all eventually work! So what I'm asking them to do is to wrap each piece of new code inside CONFIG_CHECKPOINT_RESTORE. So if it all eventually comes to tears and the project as a whole fails, it should be a simple matter to go through and delete all trace of it. This lot pretty much wraps up the -rc1 merge for me. * akpm: (96 commits) unlzo: fix input buffer free ramoops: update parameters only after successful init ramoops: fix use of rounddown_pow_of_two() c/r: prctl: add PR_SET_MM codes to set up mm_struct entries c/r: procfs: add start_data, end_data, start_brk members to /proc/$pid/stat v4 c/r: introduce CHECKPOINT_RESTORE symbol selftests: new x86 breakpoints selftest selftests: new very basic kernel selftests directory radix_tree: take radix_tree_path off stack radix_tree: remove radix_tree_indirect_to_ptr() dio: optimize cache misses in the submission path vfs: cache request_queue in struct block_device fs/direct-io.c: calculate fs_count correctly in get_more_blocks() drivers/parport/parport_pc.c: fix warnings panic: don't print redundant backtraces on oops sysctl: add the kernel.ns_last_pid control kdump: add udev events for memory online/offline include/linux/crash_dump.h needs elf.h kdump: fix crash_kexec()/smp_send_stop() race in panic() kdump: crashk_res init check for /sys/kernel/kexec_crash_size ...
2012-01-12	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits) pptp: Accept packet with seq zero RDS: Remove some unused iWARP code net: fsl: fec: handle 10Mbps speed in RMII mode drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c: add missing iounmap drivers/net/ethernet/tundra/tsi108_eth.c: add missing iounmap ksz884x: fix mtu for VLAN net_sched: sfq: add optional RED on top of SFQ dp83640: Fix NOHZ local_softirq_pending 08 warning gianfar: Fix invalid TX frames returned on error queue when time stamping gianfar: Fix missing sock reference when processing TX time stamps phylib: introduce mdiobus_alloc_size() net: decrement memcg jump label when limit, not usage, is changed net: reintroduce missing rcu_assign_pointer() calls inet_diag: Rename inet_diag_req_compat into inet_diag_req inet_diag: Rename inet_diag_req into inet_diag_req_v2 bond_alb: don't disable softirq under bond_alb_xmit mac80211: fix rx->key NULL pointer dereference in promiscuous mode nl80211: fix old station flags compatibility mdio-octeon: use an unique MDIO bus name. mdio-gpio: use an unique MDIO bus name. ...
2012-01-12	unlzo: fix input buffer free	Sascha Hauer
	unlzo modifies the pointer to in_buf, so we have to free the original buffer, not the modified pointer. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Cc: Lasse Collin <lasse.collin@tukaani.org> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	ramoops: update parameters only after successful init	Kees Cook
	If a platform device exists on the system, but ramoops fails to attach to it, the module parameters are overridden before ramoops can fall back and try to use passed module parameters. Move update to end of init routine. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Marco Stornelli <marco.stornelli@gmail.com> Cc: Sergiu Iordache <sergiu@chromium.org> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	ramoops: fix use of rounddown_pow_of_two()	Marco Stornelli
	The return value of rounddown_pow_of_two wasn't evaluated, so the operation was a no-op. Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Reported-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	c/r: prctl: add PR_SET_MM codes to set up mm_struct entries	Cyrill Gorcunov
	When we restore a task we need to set up text, data and data heap sizes from userspace to the values a task had at checkpoint time. This patch adds auxilary prctl codes for that. While most of them have a statistical nature (their values are involved into calculation of /proc/<pid>/statm output) the start_brk and brk values are used to compute an allowed size of program data segment expansion. Which means an arbitrary changes of this values might be dangerous operation. So to restrict access the following requirements applied to prctl calls: - The process has to have CAP_SYS_ADMIN capability granted. - For all opcodes except start_brk/brk members an appropriate VMA area must exist and should fit certain VMA flags, such as: - code segment must be executable but not writable; - data segment must not be executable. start_brk/brk values must not intersect with data segment and must not exceed RLIMIT_DATA resource limit. Still the main guard is CAP_SYS_ADMIN capability check. Note the kernel should be compiled with CONFIG_CHECKPOINT_RESTORE support otherwise these prctl calls will return -EINVAL. [akpm@linux-foundation.org: cache current->mm in a local, saving 200 bytes text] Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	c/r: procfs: add start_data, end_data, start_brk members to /proc/$pid/stat v4	Cyrill Gorcunov
	The mm->start_code/end_code, mm->start_data/end_data, mm->start_brk are involved into calculation of program text/data segment sizes (which might be seen in /proc/<pid>/statm) and into brk() call final address. For restore we need to know all these values. While mm->start_code/end_code already present in /proc/$pid/stat, the rest members are not, so this patch brings them in. The restore procedure of these members is addressed in another patch using prctl(). Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	c/r: introduce CHECKPOINT_RESTORE symbol	Cyrill Gorcunov
	For checkpoint/restore we need auxilary features being compiled into the kernel, such as additional prctl codes, /proc/<pid>/map_files and etc... but same time these features are not mandatory for a regular kernel so CHECKPOINT_RESTORE config symbol should bring a way to disable them all at once if one wish to get rid of additional functionality. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	selftests: new x86 breakpoints selftest	Frederic Weisbecker
	Bring a first selftest in the relevant directory. This tests several combinations of breakpoints and watchpoints in x86, as well as icebp traps and int3 traps. Given the amount of breakpoint regressions we raised after we merged the generic breakpoint infrastructure, such selftest became necessary and can still serve today as a basis for new patches that touch the do_debug() path. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	selftests: new very basic kernel selftests directory	Frederic Weisbecker
	Bring a new kernel selftests directory in tools/testing/selftests. To add a new selftest, create a subdirectory with the sources and a makefile that creates a target named "run_test" then add the subdirectory name to the TARGET var in tools/testing/selftests/Makefile and tools/testing/selftests/run_tests script. This can help centralizing and maintaining any useful selftest that developers usually tend to let rust in peace on some random server. Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	radix_tree: take radix_tree_path off stack	Hugh Dickins
	Down, down in the deepest depths of GFP_NOIO page reclaim, we have shrink_page_list() calling __remove_mapping() calling __delete_from_ swap_cache() or __delete_from_page_cache(). You would not expect those to need much stack, but in fact they call radix_tree_delete(): which declares a 192-byte radix_tree_path array on its stack (to record the node,offsets it visits when descending, in case it needs to ascend to update them). And if any tag is still set [1], that calls radix_tree_tag_clear(), which declares a further such 192-byte radix_tree_path array on the stack. (At least we have interrupts disabled here, so won't then be pushing registers too.) That was probably a good choice when most users were 32-bit (array of half the size), and adding fields to radix_tree_node would have bloated it unnecessarily. But nowadays many are 64-bit, and each radix_tree_node contains a struct rcu_head, which is only used when freeing; whereas the radix_tree_path info is only used for updating the tree (deleting, clearing tags or setting tags if tagged) when a lock must be held, of no interest when accessing the tree locklessly. So add a parent pointer to the radix_tree_node, in union with the rcu_head, and remove all uses of the radix_tree_path. There would be space in that union to save the offset when descending as before (we can argue that a lock must already be held to exclude other users), but recalculating it when ascending is both easy (a constant shift and a constant mask) and uncommon, so it seems better just to do that. Two little optimizations: no need to decrement height when descending, adjusting shift is enough; and once radix_tree_tag_if_tagged() has set tag on a node and its ancestors, it need not ascend from that node again. perf on the radix tree test harness reports radix_tree_insert() as 2% slower (now having to set parent), but radix_tree_delete() 24% faster. Surely that's an exaggeration from rtth's artificially low map shift 3, but forcing it back to 6 still rates radix_tree_delete() 8% faster. [1] Can a pagecache tag (dirty, writeback or towrite) actually still be set at the time of radix_tree_delete()? Perhaps not if the filesystem is well-behaved. But although I've not tracked any stack overflow down to this cause, I have observed a curious case in which a dirty tag is set and left set on tmpfs: page migration's migrate_page_copy() happens to use __set_page_dirty_nobuffers() to set PageDirty on the newpage, and that sets PAGECACHE_TAG_DIRTY as a side-effect - harmless to a filesystem which doesn't use tags, except for this stack depth issue. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nai Xia <nai.xia@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	radix_tree: remove radix_tree_indirect_to_ptr()	Xiao Guangrong
	It is not used anymore, remove it Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	dio: optimize cache misses in the submission path	Andi Kleen
	Some investigation of a transaction processing workload showed that a major consumer of cycles in __blockdev_direct_IO is the cache miss while accessing the block size. This is because it has to walk the chain from block_dev to gendisk to queue. The block size is needed early on to check alignment and sizes. It's only done if the check for the inode block size fails. But the costly block device state is unconditionally fetched. - Reorganize the code to only fetch block dev state when actually needed. Then do a prefetch on the block dev early on in the direct IO path. This is worth it, because there is substantial code run before we actually touch the block dev now. - I also added some unlikelies to make it clear the compiler that block device fetch code is not normally executed. This gave a small, but measurable improvement on a large database benchmark (about 0.3%) [akpm@linux-foundation.org: coding-style fixes] [sfr@canb.auug.org.au: using prefetch requires including prefetch.h] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	vfs: cache request_queue in struct block_device	Andi Kleen
	This makes it possible to get from the inode to the request_queue with one less cache miss. Used in followon optimization. The livetime of the pointer is the same as the gendisk. This assumes that the queue will always stay the same in the gendisk while it's visible to block_devices. I think that's safe correct? Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	fs/direct-io.c: calculate fs_count correctly in get_more_blocks()	Tao Ma
	In get_more_blocks(), we use dio_count to calcuate fs_count and do some tricky things to increase fs_count if dio_count isn't aligned. But actually it still has some corner cases that can't be coverd. See the following example: dio_write foo -s 1024 -w 4096 (direct write 4096 bytes at offset 1024). The same goes if the offset isn't aligned to fs_blocksize. In this case, the old calculation counts fs_count to be 1, but actually we will write into 2 different blocks (if fs_blocksize=4096). The old code just works, since it will call get_block twice (and may have to allocate and create extents twice for filesystems like ext4). So we'd better call get_block just once with the proper fs_count. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	drivers/parport/parport_pc.c: fix warnings	Andrew Morton
	drivers/parport/parport_pc.c: In function '__check_irq': drivers/parport/parport_pc.c:3415: warning: return from incompatible pointer type drivers/parport/parport_pc.c: In function '__check_dma': drivers/parport/parport_pc.c:3417: warning: return from incompatible pointer type Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	panic: don't print redundant backtraces on oops	Andi Kleen
	When an oops causes a panic and panic prints another backtrace it's pretty common to have the original oops data be scrolled away on a 80x50 screen. The second backtrace is quite redundant and not needed anyways. So don't print the panic backtrace when oops_in_progress is true. [akpm@linux-foundation.org: add comment] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	sysctl: add the kernel.ns_last_pid control	Pavel Emelyanov
	The sysctl works on the current task's pid namespace, getting and setting its last_pid field. Writing is allowed for CAP_SYS_ADMIN-capable tasks thus making it possible to create a task with desired pid value. This ability is required badly for the checkpoint/restore in userspace. This approach suits all the parties for now. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	kdump: add udev events for memory online/offline	Michael Holzheu
	Currently no udev events for memory hotplug "online" and "offline" are generated: # udevadm monitor # echo offline > /sys/devices/system/memory/memory4/state ==> No event When kdump is loaded, kexec detects the current memory configuration and stores it in the pre-allocated ELF core header. Therefore, for kdump it is necessary to reload the kdump kernel with kexec when the memory configuration changes (e.g. for online/offline hotplug memory). In order to do this automatically, udev rules should be used. This kernel patch adds udev events for "online" and "offline". Together with this kernel patch, the following udev rules for online/offline have to be added to "/etc/udev/rules.d/98-kexec.rules": SUBSYSTEM=="memory", ACTION=="online", PROGRAM="/etc/init.d/kdump restart" SUBSYSTEM=="memory", ACTION=="offline", PROGRAM="/etc/init.d/kdump restart" [sfr@canb.auug.org.au: fixups for class to subsystem conversion] Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	include/linux/crash_dump.h needs elf.h	Fabio Estevam
	Building an ARM target we get the following warnings: CC arch/arm/kernel/setup.o In file included from arch/arm/kernel/setup.c:39: arch/arm/include/asm/elf.h:102:1: warning: "vmcore_elf64_check_arch" redefined In file included from arch/arm/kernel/setup.c:24: include/linux/crash_dump.h:30:1: warning: this is the location of the previous definition Quoting Russell King: "linux/crash_dump.h makes no attempt to include asm/elf.h, but it depends on stuff in asm/elf.h to determine how stuff inside this file is defined at parse time. So, if asm/elf.h is included after linux/crash_dump.h or not at all, you get a different result from the situation where asm/elf.h is included before." So add elf.h header to crash_dump.h to avoid this problem. The original discussion about this can be found at: http://www.spinics.net/lists/arm-kernel/msg154113.html Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: <stable@vger.kernel.org> [3.2.1] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	kdump: fix crash_kexec()/smp_send_stop() race in panic()	Michael Holzheu
	When two CPUs call panic at the same time there is a possible race condition that can stop kdump. The first CPU calls crash_kexec() and the second CPU calls smp_send_stop() in panic() before crash_kexec() finished on the first CPU. So the second CPU stops the first CPU and therefore kdump fails: 1st CPU: panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump 2nd CPU: panic()->crash_kexec()->kexec_mutex already held by 1st CPU ->smp_send_stop()-> stop 1st CPU (stop kdump) This patch fixes the problem by introducing a spinlock in panic that allows only one CPU to process crash_kexec() and the subsequent panic code. All other CPUs call the weak function panic_smp_self_stop() that stops the CPU itself. This function can be overloaded by architecture code. For example "tile" can use their lower-power "nap" instruction for that. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>