summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-11-28sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space ↵Tommi Rantala
fails Trinity (the syscall fuzzer) discovered a memory leak in SCTP, reproducible e.g. with the sendto() syscall by passing invalid user space pointer in the second argument: #include <string.h> #include <arpa/inet.h> #include <sys/socket.h> int main(void) { int fd; struct sockaddr_in sa; fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/); if (fd < 0) return 1; memset(&sa, 0, sizeof(sa)); sa.sin_family = AF_INET; sa.sin_addr.s_addr = inet_addr("127.0.0.1"); sa.sin_port = htons(11111); sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa)); return 0; } As far as I can tell, the leak has been around since ~2003. Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28CIFS: Fix wrong buffer pointer usage in smb_set_file_infoPavel Shilovsky
Commit 6bdf6dbd662176c0da5c3ac8ed10ac94e7776c85 caused a regression in setattr codepath that leads to files with wrong attributes. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-11-28percpu-rwsem: use synchronize_sched_expeditedMikulas Patocka
Use synchronize_sched_expedited() instead of synchronize_sched() to improve mount speed. This patch improves mount time from 0.500s to 0.013s for Jeff's test-case. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-28EDAC: Handle error path in edac_mc_sysfs_init() properlyDenis Kirjanov
Make sure proper deregistration happens on all error paths in edac_mc_sysfs_init. Signed-off-by: Denis Kirjanov <kirjanov@gmail.com> [ Boris: cleanup and concretize commit message ] Signed-off-by: Borislav Petkov <bp@alien8.de>
2012-11-28MCE, AMD: Dump error statusBorislav Petkov
Dump error status after decoding the error which describes the error disposition. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28MCE, AMD: Report decoded error type firstBorislav Petkov
Instead of starting with the error details, report the decoded, readable error type first. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28MCE, AMD: Dump CPU f/m/s triple with the errorBorislav Petkov
It is very useful to have the family/model/stepping with the reported error so dump it. This saves us asking the bug reporter about it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28MCE, AMD: Remove functional unit referencesBorislav Petkov
Having the functional unit names in each bank decode is only misleading as this code supports multiple families and there's no guarantee the mapping between FUs and MCE banks will stay the same. And also, knowing the functional unit name doesn't help much since you end up looking at the respective BKDG anyway. So drop all FU references and use the MC bank numbers instead. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Convert to use simple_open()Wei Yongjun
This removes an open coded simple_open() function and replaces file operations references to the function with simple_open() instead. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC, Calxeda highbank: Convert to use simple_open()Wei Yongjun
This removes an open coded simple_open() function and replaces file operations references to the function with simple_open() instead. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Cc: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Fix mc size reported in sysfsJosh Hunt
This is the complement to previous commit "EDAC: Fix csrow size reported in sysfs". This fixes the memory controller size reporting on csrow-based memory controllers. The csrow size is already combined for both channels. Without this patch memory size is reported doubled. Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Fix csrow size reported in sysfsBorislav Petkov
On csrow-based memory controllers, we combine the csrow size from both channels and there's no need to do that again in csrow_size_show which leads to double the size of a csrow. Fix it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Pass mci parentBorislav Petkov
Initialize the mem_ctl_info descriptor of a csrow properly. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Add memory controller flagsBorislav Petkov
The first flag is ->csbased and will be used in common EDAC code later. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Fix csrows size and pages computationBorislav Petkov
Make sure code pays attention to K8 having only one DCT, reformat and cleanup code, correct debug messages, remove unused code. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Use DBAM_DIMM macroBorislav Petkov
Instead of open-coding it, use the DBAM_DIMM macro in amd64_csrow_nr_pages() which we have already. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Fix K8 chip select reportingBorislav Petkov
This basically reverts 603adaf6b3e3 ("amd64_edac: fix K8 chip select reporting") because it was a clumsy workaround for DIMM sizes reporting on K8 which got superceded by a much more correct one with 41d8bfaba70 ("amd64_edac: Improve DRAM address mapping") without removing the prior one. Remove it now finally. Reported-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Reorganize error reporting pathBorislav Petkov
Rewrite CE/UE paths so that they use the same code and drop additional code duplication in handle_ue. Add a struct err_info which collects required info for the error reporting. This, in turn, helps slimming all edac_mc_handle_error() calls down to one. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Do not check whether error address is validBorislav Petkov
All families report a valid error address when encountering a DRAM ECC error so no need to check it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Improve error injectionBorislav Petkov
When injecting DRAM ECC errors over the F3xB[8,C] interface, the machine does this by injecting the error in the next non-cached access. This takes relatively long time on a normal system so that in order for us to expedite it, we disable the caches around the injection. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Cleanup error injection codeBorislav Petkov
Invert kstrtoul return value testing and win one indentation level. Also, shorten up macro names so that the lines can fit into 80 cols. No functional change. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Small fixlets and cleanupsBorislav Petkov
amd64_get_dram_hole_info: remove local variable 'base'. sys_addr_to_dram_addr: do not clear local variable 'ret'. Also, sanitize constants formatting. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Handle empty msg strings when reporting errorsBorislav Petkov
A reported error could look like this [ 226.178315] EDAC MC0: 1 CE on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x427c0d offset:0xde0 grain:0 syndrome:0x1c6) with two spaces back-to-back due to the msg argument of edac_mc_handle_error being passed on empty by the specific drivers. Handle that. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Remove useless assignment of error typeBorislav Petkov
The tracepoint decodes the error type later anyway so remove a useless assignment to the temporary p which gets overwritten later anyway. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Boundary-check edac_debug_levelBorislav Petkov
Only levels [0:4] are allowed so enforce that. Also, while at it, massage Kconfig text and add valid debug levels range to the module parameter description. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Respect operational state in edac_pci.cBorislav Petkov
Currently, we unconditionally enable PCI polling and we don't look at the edac_op_state module parameter. Make this dependent on the parameter setting supplied on the command line. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28ARM: 7586/1: sp804: set cpumask to cpu_possible_mask for clock event deviceWill Deacon
The SP804 driver statically initialises the cpumask of the clock event device to be cpu_all_mask, which is derived from the compile-time constant NR_CPUS. This breaks SMP_ON_UP systems where the interrupt controller handling the sp804 doesn't have the irq_set_affinity callback on the irq_chip, because the common timer code fails to identify the device as cpu-local and ends up treating it as a broadcast device instead. This patch fixes the problem by using cpu_possible_mask at runtime, which will correctly represent the possible CPUs when SMP_ON_UP is being used. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-11-28i7core_edac: fix panic when accessing sysfs filesPrarit Bhargava
The i7core_edac addrmatch_dev and chancounts_dev have sysfs files associated with them. The sysfs files, however, are coded so that the parent device is is the mci device. This is incorrect and the mci struct should be obtained through the addrmatch_dev and chancounts_dev device's private data field which is populated in i7core_create_sysfs_devices(). Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-11-28Merge branch 'drm-fixes-3.7' of git://people.freedesktop.org/~agd5f/linuxDave Airlie
Just a single pll/crtc regression fix. * 'drm-fixes-3.7' of git://people.freedesktop.org/~agd5f/linux: radeon: fix pll/ctrc mapping on dce2 and dce3 hardware
2012-11-27radeon: fix pll/ctrc mapping on dce2 and dce3 hardwareJerome Glisse
This fix black screen on resume issue that some people are experiencing. There is a bug in the atombios code regarding pll/crtc mapping. The atombios code reverse the logic for the pll and crtc mapping. agd5f: drop unnecessary crtc id check, cc stable in case we miss 3.7. This fixes the root cause that was worked around by commits: drm/radeon: allocate PPLLs from low to high drm/radeon/dce3: switch back to old pll allocation order for discrete Signed-off-by: Jerome Glisse <jglisse@redhat.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2012-11-27Merge branch 'v4l_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "For some media fixes: - dvb_usb_v2: some fixes at the core - Some fixes on some embedded drivers: soc_camera, adv7604, omap3isp, exynos/s5p - Several Exynos4/5 camera fixes - a fix at stv0900 driver - a few USB ID additions to detect more variants of rtl28xxu-based sticks" * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (25 commits) [media] rtl28xxu: 0ccd:00d7 TerraTec Cinergy T Stick+ [media] rtl28xxu: 1d19:1102 Dexatek DK mini DVB-T Dongle [media] mt9v022: fix the V4L2_CID_EXPOSURE control [media] mx2_camera: fix missing unlock on error in mx2_start_streaming() [media] media: omap1_camera: fix const cropping related warnings [media] media: mx1_camera: use the default .set_crop() implementation [media] media: mx2_camera: fix const cropping related warnings [media] media: mx3_camera: fix const cropping related warnings [media] media: pxa_camera: fix const cropping related warnings [media] media: sh_mobile_ceu_camera: fix const cropping related warnings [media] media: sh_vou: fix const cropping related warnings [media] adv7604: restart STDI once if format is not found [media] adv7604: use presets where possible [media] adv7604: Replace prim_mode by mode [media] adv7604: cleanup references [media] dvb_usb_v2: switch interruptible mutex to normal [media] dvb_usb_v2: fix pid_filter callback error logging [media] exynos-gsc: change driver compatible string [media] omap3isp: Fix warning caused by bad subdev events operations prototypes [media] omap3isp: video: Fix warning caused by bad vidioc_s_crop prototype ...
2012-11-27cifs: fix writeback race with file that is growingJeff Layton
Commit eddb079deb4 created a regression in the writepages codepath. Previously, whenever it needed to check the size of the file, it did so by consulting the inode->i_size field directly. With that patch, the i_size was fetched once on entry into the writepages code and that value was used henceforth. If the file is changing size though (for instance, if someone is writing to it or has truncated it), then that value is likely to be wrong. This can lead to data corruption. Pages past the EOF at the time that the writepages call was issued may be silently dropped and ignored because cifs_writepages wrongly assumes that the file must have been truncated in the interim. Fix cifs_writepages to properly fetch the size from the inode->i_size field instead to properly account for this possibility. Original bug report is here: https://bugzilla.kernel.org/show_bug.cgi?id=50991 Reported-and-Tested-by: Maxim Britov <ungifted01@gmail.com> Reviewed-by: Suresh Jayaraman <sjayaraman@suse.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-11-27x86-32: Unbreak booting on some 486 clonesH. Peter Anvin
There appear to have been some 486 clones, including the "enhanced" version of Am486, which have CPUID but not CR4. These 486 clones had only the FPU flag, if any, unlike the Intel 486s with CPUID, which also had VME and therefore needed CR4. Therefore, look at the basic CPUID flags and require at least one bit other than bit 0 before we modify CR4. Thanks to Christian Ludloff of sandpile.org for confirming this as a problem. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-27Merge branch 'exynos-drm-fixes' of ↵Dave Airlie
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos Inki writes: This pull request fixes minor issues and includes code cleanup. * 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos: drm/exynos: Fix potential NULL pointer dereference in exynos_drm_encoder.c drm/exynos: Make exynos4/5_fimd_driver_data static drm/exynos: fix overlay updating issue drm/exynos: remove unnecessary code. drm/exynos: fix linux framebuffer address setting.
2012-11-27Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intelDave Airlie
Daniel writes: - Unbreak mbp retina, this time with a much more fine-grained approach (since the previous "completely ignore edp vbt bpp value" regressed some machines even after fixing a bug in our dp bw code). - Disable cloning on sdvo. It just doesn't work (yeah took us a while to figure out), leading to jittery outputs in the best case. - Revert rc6 for ilk again. It seems to help a few of the gpu hang reporters at least, and it's definitely the best we've got. Head-against-the-wall-banging is still ongoing for what really breaks (and how we can reproduce the non-rc6 hangs and how to reproduce on gen4). * 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel: Revert "drm/i915: enable rc6 on ilk again" drm/i915: do not default to 18 bpp for eDP if missing from VBT drm/i915: disable cloning on sdvo
2012-11-26Merge branch 'akpm' (Fixes from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "8 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (8 patches) futex: avoid wake_futex() for a PI futex_q watchdog: using u64 in get_sample_period() writeback: put unused inodes to LRU after writeback completion mm: vmscan: check for fatal signals iff the process was throttled Revert "mm: remove __GFP_NO_KSWAPD" proc: check vma->vm_file before dereferencing UAPI: strip the _UAPI prefix from header guards during header installation include/linux/bug.h: fix sparse warning related to BUILD_BUG_ON_INVALID
2012-11-26Merge tag 'tty-3.7-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull TTY fix from Greg Kroah-Hartman: "Here is a single fix for a reported regression in 3.7-rc5 for the tty layer. This fix has been in the linux-next tree and solves the reported problem. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'tty-3.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty vt: Fix a regression in command line edition
2012-11-26Merge tag 'mfd-for-linus-3.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 Pull MFD fixes from Samuel Ortiz: - A twl fix preventing a buffer overflow. - A wm5102 register patch fix. - A wm5110 error misreport fix. - Arizona fixes: Use the right array size when adding subdevices, correctly report underclocked events, synchronize register cache after reset. - A twl4030 fix for preventing the system to hang from an interrupt flood. * tag 'mfd-for-linus-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: mfd: twl4030: Fix chained irq handling on resume from suspend mfd: arizona: Sync regcache after reset mfd: arizona: Correctly report when AIF2/AIF1 is underclocked mfd: arizona: Use correct array for ARRAY_SIZE in mfd_add_devices call mfd: wm5110: Disable control interface error report for WM5110 rev B mfd: wm5102: Update register patch for latest evaluation mfd: twl-core: Fix chip ID for the twl6030-pwm module
2012-11-26Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-armLinus Torvalds
Pull ARM fixes from Russell King: "Not much here, just a couple minor/cosmetic fixes and a patch for the decompressor which fixes problems with modern GCC and CPUs." * 'fixes' of git://git.linaro.org/people/rmk/linux-arm: ARM: 7583/1: decompressor: Enable unaligned memory access for v6 and above ARM: 7572/1: proc-v6.S: fix comment ARM: 7570/1: quiet down the non make -s output
2012-11-26Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext3 regression fix from Jan Kara: "Fix an ext3 regression introduced during 3.7 merge window. It leads to deadlock if you stress the filesystem in the right way (luckily only if blocksize < pagesize)." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: jbd: Fix lock ordering bug in journal_unmap_buffer()
2012-11-26futex: avoid wake_futex() for a PI futex_qDarren Hart
Dave Jones reported a bug with futex_lock_pi() that his trinity test exposed. Sometime between queue_me() and taking the q.lock_ptr, the lock_ptr became NULL, resulting in a crash. While futex_wake() is careful to not call wake_futex() on futex_q's with a pi_state or an rt_waiter (which are either waiting for a futex_unlock_pi() or a PI futex_requeue()), futex_wake_op() and futex_requeue() do not perform the same test. Update futex_wake_op() and futex_requeue() to test for q.pi_state and q.rt_waiter and abort with -EINVAL if detected. To ensure any future breakage is caught, add a WARN() to wake_futex() if the same condition is true. This fix has seen 3 hours of testing with "trinity -c futex" on an x86_64 VM with 4 CPUS. [akpm@linux-foundation.org: tidy up the WARN()] Signed-off-by: Darren Hart <dvhart@linux.intel.com> Reported-by: Dave Jones <davej@redat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: John Kacur <jkacur@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-26watchdog: using u64 in get_sample_period()Chuansheng Liu
In get_sample_period(), unsigned long is not enough: watchdog_thresh * 2 * (NSEC_PER_SEC / 5) case1: watchdog_thresh is 10 by default, the sample value will be: 0xEE6B2800 case2: set watchdog_thresh is 20, the sample value will be: 0x1 DCD6 5000 In case2, we need use u64 to express the sample period. Otherwise, changing the threshold thru proc often can not be successful. Signed-off-by: liu chuansheng <chuansheng.liu@intel.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-26writeback: put unused inodes to LRU after writeback completionJan Kara
Commit 169ebd90131b ("writeback: Avoid iput() from flusher thread") removed iget-iput pair from inode writeback. As a side effect, inodes that are dirty during iput_final() call won't be ever added to inode LRU (iput_final() doesn't add dirty inodes to LRU and later when the inode is cleaned there's noone to add the inode there). Thus inodes are effectively unreclaimable until someone looks them up again. The practical effect of this bug is limited by the fact that inodes are pinned by a dentry for long enough that the inode gets cleaned. But still the bug can have nasty consequences leading up to OOM conditions under certain circumstances. Following can easily reproduce the problem: for (( i = 0; i < 1000; i++ )); do mkdir $i for (( j = 0; j < 1000; j++ )); do touch $i/$j echo 2 > /proc/sys/vm/drop_caches done done then one needs to run 'sync; ls -lR' to make inodes reclaimable again. We fix the issue by inserting unused clean inodes into the LRU after writeback finishes in inode_sync_complete(). Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: <stable@vger.kernel.org> [3.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-26mm: vmscan: check for fatal signals iff the process was throttledMel Gorman
Commit 5515061d22f0 ("mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage") introduced a check for fatal signals after a process gets throttled for network storage. The intention was that if a process was throttled and got killed that it should not trigger the OOM killer. As pointed out by Minchan Kim and David Rientjes, this check is in the wrong place and too broad. If a system is in am OOM situation and a process is exiting, it can loop in __alloc_pages_slowpath() and calling direct reclaim in a loop. As the fatal signal is pending it returns 1 as if it is making forward progress and can effectively deadlock. This patch moves the fatal_signal_pending() check after throttling to throttle_direct_reclaim() where it belongs. If the process is killed while throttled, it will return immediately without direct reclaim except now it will have TIF_MEMDIE set and will use the PFMEMALLOC reserves. Minchan pointed out that it may be better to direct reclaim before returning to avoid using the reserves because there may be pages that can easily reclaim that would avoid using the reserves. However, we do no such targetted reclaim and there is no guarantee that suitable pages are available. As it is expected that this throttling happens when swap-over-NFS is used there is a possibility that the process will instead swap which may allocate network buffers from the PFMEMALLOC reserves. Hence, in the swap-over-nfs case where a process can be throtted and be killed it can use the reserves to exit or it can potentially use reserves to swap a few pages and then exit. This patch takes the option of using the reserves if necessary to allow the process exit quickly. If this patch passes review it should be considered a -stable candidate for 3.6. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-26Revert "mm: remove __GFP_NO_KSWAPD"Mel Gorman
With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" reverted, Zdenek Kabelac reported the following Hmm, so it's just took longer to hit the problem and observe kswapd0 spinning on my CPU again - it's not as endless like before - but still it easily eats minutes - it helps to turn off Firefox or TB (memory hungry apps) so kswapd0 stops soon - and restart those apps again. (And I still have like >1GB of cached memory) kswapd0 R running task 0 30 2 0x00000000 Call Trace: preempt_schedule+0x42/0x60 _raw_spin_unlock+0x55/0x60 put_super+0x31/0x40 drop_super+0x22/0x30 prune_super+0x149/0x1b0 shrink_slab+0xba/0x510 The sysrq+m indicates the system has no swap so it'll never reclaim anonymous pages as part of reclaim/compaction. That is one part of the problem but not the root cause as file-backed pages could also be reclaimed. The likely underlying problem is that kswapd is woken up or kept awake for each THP allocation request in the page allocator slow path. If compaction fails for the requesting process then compaction will be deferred for a time and direct reclaim is avoided. However, if there are a storm of THP requests that are simply rejected, it will still be the the case that kswapd is awake for a prolonged period of time as pgdat->kswapd_max_order is updated each time. This is noticed by the main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead it will loopp, shrinking a small number of pages and calling shrink_slab() on each iteration. The temptation is to supply a patch that checks if kswapd was woken for THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not backed up by proper testing. As 3.7 is very close to release and this is not a bug we should release with, a safer path is to revert "mm: remove __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the balance_pgdat() logic in general. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Zdenek Kabelac <zkabelac@redhat.com> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-26proc: check vma->vm_file before dereferencingStanislav Kinsbursky
Commit 7b540d0646ce ("proc_map_files_readdir(): don't bother with grabbing files") switched proc_map_files_readdir() to use @f_mode directly instead of grabbing @file reference, but same time the test for @vm_file presence was lost leading to nil dereference. The patch brings the test back. The all proc_map_files feature is CONFIG_CHECKPOINT_RESTORE wrapped (which is set to 'n' by default) so the bug doesn't affect regular kernels. The regression is 3.7-rc1 only as far as I can tell. [gorcunov@openvz.org: provided changelog] Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-26UAPI: strip the _UAPI prefix from header guards during header installationDavid Howells
Strip the _UAPI prefix from header guards during header installation so that any userspace dependencies aren't affected. glibc, for example, checks for linux/types.h, linux/kernel.h, linux/compiler.h and linux/list.h by their guards - though the last two aren't actually exported. libtool: compile: gcc -std=gnu99 -DHAVE_CONFIG_H -I. -Wall -Werror -Wformat -Wformat-security -D_FORTIFY_SOURCE=2 -fno-delete-null-pointer-checks -fstack-protector -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -c child.c -fPIC -DPIC -o .libs/child.o In file included from cli.c:20:0: common.h:152:8: error: redefinition of 'struct sysinfo' In file included from /usr/include/linux/kernel.h:4:0, from /usr/include/linux/sysctl.h:25, from /usr/include/sys/sysctl.h:43, from common.h:50, from cli.c:20: /usr/include/linux/sysinfo.h:7:8: note: originally defined here Reported-by: Tomasz Torcz <tomek@pipebreaker.pl> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-26include/linux/bug.h: fix sparse warning related to BUILD_BUG_ON_INVALIDTushar Behera
Commit baf05aa9271b ("bug: introduce BUILD_BUG_ON_INVALID() macro") introduces this macro only when _CHECKER_ is not defined. Define a silent macro in the else condition to fix following sparse warning: mm/filemap.c:395:9: error: undefined identifier 'BUILD_BUG_ON_INVALID' mm/filemap.c:396:9: error: undefined identifier 'BUILD_BUG_ON_INVALID' mm/filemap.c:397:9: error: undefined identifier 'BUILD_BUG_ON_INVALID' include/linux/mm.h:419:9: error: undefined identifier 'BUILD_BUG_ON_INVALID' include/linux/mm.h:419:9: error: not a function <noident> Signed-off-by: Tushar Behera <tushar.behera@linaro.org> Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-27md/raid1{,0}: fix deadlock in bitmap_unplug.NeilBrown
If the raid1 or raid10 unplug function gets called from a make_request function (which is very possible) when there are bios on the current->bio_list list, then it will not be able to successfully call bitmap_unplug() and it could need to submit more bios and wait for them to complete. But they won't complete while current->bio_list is non-empty. So detect that case and handle the unplugging off to another thread just like we already do when called from within the scheduler. RAID1 version of bug was introduced in 3.6, so that part of fix is suitable for 3.6.y. RAID10 part won't apply. Cc: stable@vger.kernel.org Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com> Reported-by: Peter Maloney <peter.maloney@brockmann-consult.de> Signed-off-by: NeilBrown <neilb@suse.de>
2012-11-26x86, kvm: Remove incorrect redundant assembly constraintH. Peter Anvin
In __emulate_1op_rax_rdx, we use "+a" and "+d" which are input/output constraints, and *then* use "a" and "d" as input constraints. This is incorrect, but happens to work on some versions of gcc. However, it breaks gcc with -O0 and icc, and may break on future versions of gcc. Reported-and-tested-by: Melanie Blower <melanie.blower@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/B3584E72CFEBED439A3ECA9BCE67A4EF1B17AF90@FMSMSX107.amr.corp.intel.com Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com>