summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2010-09-10xfs: log IO completion workqueue is a high priority queueDave Chinner1-1/+2
The workqueue implementation in 2.6.36-rcX has changed, resulting in the workqueues no longer having dedicated threads for work processing. This has caused severe livelocks under heavy parallel create workloads because the log IO completions have been getting held up behind metadata IO completions. Hence log commits would stall, memory allocation would stall because pages could not be cleaned, and lock contention on the AIL during inode IO completion processing was being seen to slow everything down even further. By making the log Io completion workqueue a high priority workqueue, they are queued ahead of all data/metadata IO completions and processed before the data/metadata completions. Hence the log never gets stalled, and operations needed to clean memory can continue as quickly as possible. This avoids the livelock conditions and allos the system to keep running under heavy load as per normal. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-09-10execve: make responsive to SIGKILL with large argumentsRoland McGrath1-0/+7
An execve with a very large total of argument/environment strings can take a really long time in the execve system call. It runs uninterruptibly to count and copy all the strings. This change makes it abort the exec quickly if sent a SIGKILL. Note that this is the conservative change, to interrupt only for SIGKILL, by using fatal_signal_pending(). It would be perfectly correct semantics to let any signal interrupt the string-copying in execve, i.e. use signal_pending() instead of fatal_signal_pending(). We'll save that change for later, since it could have user-visible consequences, such as having a timer set too quickly make it so that an execve can never complete, though it always happened to work before. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-10execve: improve interactivity with large argumentsRoland McGrath1-0/+2
This adds a preemption point during the copying of the argument and environment strings for execve, in copy_strings(). There is already a preemption point in the count() loop, so this doesn't add any new points in the abstract sense. When the total argument+environment strings are very large, the time spent copying them can be much more than a normal user time slice. So this change improves the interactivity of the rest of the system when one process is doing an execve with very large arguments. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-10setup_arg_pages: diagnose excessive argument sizeRoland McGrath1-0/+5
The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not check the size of the argument/environment area on the stack. When it is unworkably large, shift_arg_pages() hits its BUG_ON. This is exploitable with a very large RLIMIT_STACK limit, to create a crash pretty easily. Check that the initial stack is not too large to make it possible to map in any executable. We're not checking that the actual executable (or intepreter, for binfmt_elf) will fit. So those mappings might clobber part of the initial stack mapping. But that is just userland lossage that userland made happen, not a kernel problem. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-10Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds5-10/+16
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Perform hardware_enable in CPU_STARTING callback KVM: i8259: fix migration KVM: fix i8259 oops when no vcpus are online KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts
2010-09-10Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds8-36/+73
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread perf symbols: Fix multiple initialization of symbol system perf: Fix CPU hotplug perf, trace: Fix module leak tracing/kprobe: Fix handling of C-unlike argument names tracing/kprobes: Fix handling of argument names perf probe: Fix handling of arguments names perf probe: Fix return probe support tracing/kprobe: Fix a memory leak in error case tracing: Do not allow llseek to set_ftrace_filter
2010-09-10KEYS: Fix bug in keyctl_session_to_parent() if parent has no session keyringDavid Howells1-1/+2
Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership of the parent process's session keyring whether or not the parent has a session keyring [CVE-2010-2960]. This results in the following oops: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0 IP: [<ffffffff811ae4dd>] keyctl_session_to_parent+0x251/0x443 ... Call Trace: [<ffffffff811ae2f3>] ? keyctl_session_to_parent+0x67/0x443 [<ffffffff8109d286>] ? __do_fault+0x24b/0x3d0 [<ffffffff811af98c>] sys_keyctl+0xb4/0xb8 [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b if the parent process has no session keyring. If the system is using pam_keyinit then it mostly protected against this as all processes derived from a login will have inherited the session keyring created by pam_keyinit during the log in procedure. To test this, pam_keyinit calls need to be commented out in /etc/pam.d/. Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Tavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-10KEYS: Fix RCU no-lock warning in keyctl_session_to_parent()David Howells1-0/+3
There's an protected access to the parent process's credentials in the middle of keyctl_session_to_parent(). This results in the following RCU warning: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by keyctl-session-/2137: #0: (tasklist_lock){.+.+..}, at: [<ffffffff811ae2ec>] keyctl_session_to_parent+0x60/0x236 stack backtrace: Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1 Call Trace: [<ffffffff8105606a>] lockdep_rcu_dereference+0xaa/0xb3 [<ffffffff811ae379>] keyctl_session_to_parent+0xed/0x236 [<ffffffff811af77e>] sys_keyctl+0xb4/0xb6 [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b The code should take the RCU read lock to make sure the parents credentials don't go away, even though it's holding a spinlock and has IRQ disabled. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-10Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds17-46/+238
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: Range check cpu in blk_cpu_to_group scatterlist: prevent invalid free when alloc fails writeback: Fix lost wake-up shutting down writeback thread writeback: do not lose wakeup events when forking bdi threads cciss: fix reporting of max queue depth since init block: switch s390 tape_block and mg_disk to elevator_change() block: add function call to switch the IO scheduler from a driver fs/bio-integrity.c: return -ENOMEM on kmalloc failure bio-integrity.c: remove dependency on __GFP_NOFAIL BLOCK: fix bio.bi_rw handling block: put dev->kobj in blk_register_queue fail path cciss: handle allocation failure cfq-iosched: Documentation help for new tunables cfq-iosched: blktrace print per slice sector stats cfq-iosched: Implement tunable group_idle cfq-iosched: Do group share accounting in IOPS when slice_idle=0 cfq-iosched: Do not idle if slice_idle=0 cciss: disable doorbell reset on reset_devices blkio: Fix return code for mkdir calls
2010-09-10Merge branch 'at91-fixes-for-linus' of git://github.com/at91linux/linux-2.6-at91Linus Torvalds4-19/+36
* 'at91-fixes-for-linus' of git://github.com/at91linux/linux-2.6-at91: AT91: at91sam9261ek: remove C99 comments but keep information AT91: at91sam9261ek board: remove warnings related to use of SPI or SD/MMC AT91: dm9000 initialization update AT91: SAM9G45 - add a separate clock entry for every single TC block AT91: clock: peripheral clocks can have other parent than mck AT91: change dma resource index
2010-09-10perf: Ensure we call add_event_to_ctx() with the right locks heldPeter Zijlstra1-0/+3
Even though we call it from the inherit path, where the child is not yet accessible, we need to hold ctx->lock, add_event_to_ctx() assumes IRQs are disabled. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-10Merge branch 'for-linus' of ↵Linus Torvalds18-40/+204
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: rawmidi: fix the get next midi device ioctl ALSA: hda - Fix wrong HP pin detection in snd_hda_parse_pin_def_config() ALSA: seq/oss - Fix double-free at error path of snd_seq_oss_open() ALSA: msnd-classic: Fix invalid cfg parameter ALSA: hda - Enable PC-beep for EeePC with ALC269 codec ALSA: hda - Add errata initverb sequence for CS42xx codecs ALSA: usb - Release capture substream URBs properly ALSA: virtuoso: fix setting of Xonar DS line-in/mic-in controls ALSA: virtuoso: work around missing reset in the Xonar DS Windows driver ALSA: hda - Add quirk for Lenovo T400s ALSA: usb-audio: fix detection of vendor-specific device protocol settings ALSA: usb-audio: Assume first control interface is for audio ALSA: hda - Add a new hp-laptop model for Conexant 5066, tested on HP G60
2010-09-10drm/i915: don't enable self-refresh on IronlakeJesse Barnes2-2/+12
We don't know how to enable it safely, especially as outputs turn on and off. When disabling LP1 we also need to make sure LP2 and 3 are already disabled. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29173 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29082 Reported-by: Chris Lord <chris@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@kernel.org Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-10xfs: prevent reading uninitialized stack memoryDan Rosenberg1-0/+2
The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12 bytes of uninitialized stack memory, because the fsxattr struct declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero) the 12-byte fsx_pad member before copying it back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-09-10AT91: at91sam9261ek: remove C99 comments but keep informationNicolas Ferre1-4/+2
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2010-09-10AT91: at91sam9261ek board: remove warnings related to use of SPI or SD/MMCNicolas Ferre1-11/+19
The sd/mmc data structure is not used if SPI is selected. The configuration of PIO on the board prevent from using both interfaces at the same time (board dependent). Remove the warnings at compilation time adding a preprocessor condition. Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2010-09-10AT91: dm9000 initialization updateNicolas Ferre1-1/+2
Add information in dm9000 mac/phy chip initialization: - irq resource details - platform data details Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2010-09-10block: Range check cpu in blk_cpu_to_groupBrian King1-2/+6
While testing CPU DLPAR, the following problem was discovered. We were DLPAR removing the first CPU, which in this case was logical CPUs 0-3. CPUs 0-2 were already marked offline and we were in the process of offlining CPU 3. After marking the CPU inactive and offline in cpu_disable, but before the cpu was completely idle (cpu_die), we ended up in __make_request on CPU 3. There we looked at the topology map to see which CPU to complete the I/O on and found no CPUs in the cpu_sibling_map. This resulted in the block layer setting the completion cpu to be NR_CPUS, which then caused an oops when we tried to complete the I/O. Fix this by sanity checking the value we return from blk_cpu_to_group to be a valid cpu value. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10Merge branch 'fix/hda' into for-linusTakashi Iwai5-1/+111
2010-09-10Merge branch 'tip/perf/urgent' of ↵Ingo Molnar1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
2010-09-10Merge branch 'perf/urgent' of ↵Ingo Molnar2-1/+9
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/urgent
2010-09-09Merge branch 'vhost-net' of ↵David S. Miller3-27/+73
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
2010-09-09ipheth: remove incorrect devtype to WWANDan Williams1-6/+1
The 'wwan' devtype is meant for devices that require preconfiguration and *every* time setup before the ethernet interface can be used, like cellular modems which require a series of setup commands on serial ports or other mechanisms before the ethernet interface will handle packets. As ipheth only requires one-per-hotplug pairing setup with no preconfiguration (like APN, phone #, etc) and the network interface is usable at any time after that initial setup, remove the incorrect devtype wwan. Signed-off-by: Dan Williams <dcbw@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-09MAINTAINERS: Add CAIFJoe Perches1-0/+10
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-09Merge branch 'upstream-linus' of ↵Linus Torvalds10-15/+64
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: libata-sff: Reenable Port Multiplier after libata-sff remodeling. libata: skip EH autopsy and recovery during suspend ahci: AHCI and RAID mode SATA patch for Intel Patsburg DeviceIDs ata_piix: IDE Mode SATA patch for Intel Patsburg DeviceIDs libata,pata_via: revert ata_wait_idle() removal from ata_sff/via_tf_load() ahci: fix hang on failed softreset pata_artop: Fix device ID parity check
2010-09-09tracing: t_start: reset FTRACE_ITER_HASH in case of seek/preadChris Wright1-0/+2
Be sure to avoid entering t_show() with FTRACE_ITER_HASH set without having properly started the iterator to iterate the hash. This case is degenerate and, as discovered by Robert Swiecki, can cause t_hash_show() to misuse a pointer. This causes a NULL ptr deref with possible security implications. Tracked as CVE-2010-3079. Cc: Robert Swiecki <swiecki@google.com> Cc: Eugene Teo <eugene@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-09libata-sff: Reenable Port Multiplier after libata-sff remodeling.Gwendal Grignou3-12/+31
Keep track of the link on the which the current request is in progress. It allows support of links behind port multiplier. Not all libata-sff is PMP compliant. Code for native BMDMA controller does not take in accound PMP. Tested on Marvell 7042 and Sil7526. Signed-off-by: Gwendal Grignou <gwendal@google.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-09-09libata: skip EH autopsy and recovery during suspendTejun Heo3-1/+18
For some mysterious reason, certain hardware reacts badly to usual EH actions while the system is going for suspend. As the devices won't be needed until the system is resumed, ask EH to skip usual autopsy and recovery and proceed directly to suspend. Signed-off-by: Tejun Heo <tj@kernel.org> Tested-by: Stephan Diestelhorst <stephan.diestelhorst@amd.com> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-09-09ahci: AHCI and RAID mode SATA patch for Intel Patsburg DeviceIDsSeth Heasley1-0/+3
This patch adds the Intel Patsburg (PCH) SATA AHCI and RAID Controller DeviceIDs. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-09-09ata_piix: IDE Mode SATA patch for Intel Patsburg DeviceIDsSeth Heasley1-0/+4
This patch adds the Intel Patsburg (PCH) IDE mode SATA Controller DeviceIDs. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-09-09libata,pata_via: revert ata_wait_idle() removal from ata_sff/via_tf_load()Tejun Heo2-0/+5
Commit 978c0666 (libata: Remove excess delay in the tf_load path) removed ata_wait_idle() from ata_sff_tf_load() and via_tf_load(). This caused obscure detection problems in sata_sil. https://bugzilla.kernel.org/show_bug.cgi?id=16606 The commit was pure performance optimization. Revert it for now. Reported-by: Dieter Plaetinck <dieter@plaetinck.be> Reported-by: Jan Beulich <JBeulich@novell.com> Bisected-by: gianluca <gianluca@sottospazio.it> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-09-09minix: fix regression in minix_mkdir()Jorge Boncompte [DTI2]1-1/+1
Commit 9eed1fb721c ("minix: replace inode uid,gid,mode init with helper") broke directory creation on minix filesystems. Fix it by passing the needed mode flag to inode init helper. Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net> Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> [2.6.35.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09mm: page allocator: drain per-cpu lists after direct reclaim allocation failsMel Gorman1-4/+16
When under significant memory pressure, a process enters direct reclaim and immediately afterwards tries to allocate a page. If it fails and no further progress is made, it's possible the system will go OOM. However, on systems with large amounts of memory, it's possible that a significant number of pages are on per-cpu lists and inaccessible to the calling process. This leads to a process entering direct reclaim more often than it should increasing the pressure on the system and compounding the problem. This patch notes that if direct reclaim is making progress but allocations are still failing that the system is already under heavy pressure. In this case, it drains the per-cpu lists and tries the allocation a second time before continuing. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09mm: page allocator: calculate a better estimate of NR_FREE_PAGES when memory ↵Christoph Lameter5-3/+72
is low and kswapd is awake Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is cheaper than scanning a number of lists. To avoid synchronization overhead, counter deltas are maintained on a per-cpu basis and drained both periodically and when the delta is above a threshold. On large CPU systems, the difference between the estimated and real value of NR_FREE_PAGES can be very high. If NR_FREE_PAGES is much higher than number of real free page in buddy, the VM can allocate pages below min watermark, at worst reducing the real number of pages to zero. Even if the OOM killer kills some victim for freeing memory, it may not free memory if the exit path requires a new page resulting in livelock. This patch introduces a zone_page_state_snapshot() function (courtesy of Christoph) that takes a slightly more accurate view of an arbitrary vmstat counter. It is used to read NR_FREE_PAGES while kswapd is awake to avoid the watermark being accidentally broken. The estimate is not perfect and may result in cache line bounces but is expected to be lighter than the IPI calls necessary to continually drain the per-cpu counters while kswapd is awake. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09mm: page allocator: update free page counters after pages are placed on the ↵Mel Gorman1-4/+5
free list When allocating a page, the system uses NR_FREE_PAGES counters to determine if watermarks would remain intact after the allocation was made. This check is made without interrupts disabled or the zone lock held and so is race-prone by nature. Unfortunately, when pages are being freed in batch, the counters are updated before the pages are added on the list. During this window, the counters are misleading as the pages do not exist yet. When under significant pressure on systems with large numbers of CPUs, it's possible for processes to make progress even though they should have been stalled. This is particularly problematic if a number of the processes are using GFP_ATOMIC as the min watermark can be accidentally breached and in extreme cases, the system can livelock. This patch updates the counters after the pages have been added to the list. This makes the allocator more cautious with respect to preserving the watermarks and mitigates livelock possibilities. [akpm@linux-foundation.org: avoid modifying incoming args] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Christoph Lameter <cl@linux.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09vmstat: update zone stat threshold when onlining a cpuKAMEZAWA Hiroyuki1-0/+1
refresh_zone_stat_thresholds() calculates parameter based on the number of online cpus. It's called at cpu offlining but needs to be called at onlining, too. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09vfs: take O_NONBLOCK out of the O_* uniqueness testJames Bottomley1-3/+7
O_NONBLOCK on parisc has a dual value: #define O_NONBLOCK 000200004 /* HPUX has separate NDELAY & NONBLOCK */ It is caught by the O_* bits uniqueness check and leads to a parisc compile error. The fix would be to take O_NONBLOCK out. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: Jamie Lokier <jamie@shareable.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09swap: discard while swapping only if SWAP_FLAG_DISCARDHugh Dickins2-2/+3
Tests with recent firmware on Intel X25-M 80GB and OCZ Vertex 60GB SSDs show a shift since I last tested in December: in part because of firmware updates, in part because of the necessary move from barriers to awaiting completion at the block layer. While discard at swapon still shows as slightly beneficial on both, discarding 1MB swap cluster when allocating is now disadvanteous: adds 25% overhead on Intel, adds 230% on OCZ (YMMV). Surrender: discard as presently implemented is more hindrance than help for swap; but might prove useful on other devices, or with improvements. So continue to do the discard at swapon, but make discard while swapping conditional on a SWAP_FLAG_DISCARD to sys_swapon() (which has been using only the lower 16 bits of int flags). We can add a --discard or -d to swapon(8), and a "discard" to swap in /etc/fstab: matching the mount option for btrfs, ext4, fat, gfs2, nilfs2. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Nigel Cunningham <nigel@tuxonice.net> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09swap: do not send discards as barriersChristoph Hellwig1-6/+3
The swap code already uses synchronous discards, no need to add I/O barriers. This fixes the worst of the terrible slowdown in swap allocation for hibernation, reported on 2.6.35 by Nigel Cunningham; but does not entirely eliminate that regression. [tj@kernel.org: superflous newlines removed] Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Nigel Cunningham <nigel@tuxonice.net> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09swap: prevent reuse during hibernationHugh Dickins1-4/+20
Move the hibernation check from scan_swap_map() into try_to_free_swap(): to catch not only the common case when hibernation's allocation itself triggers swap reuse, but also the less likely case when concurrent page reclaim (shrink_page_list) might happen to try_to_free_swap from a page. Hibernation already clears __GFP_IO from the gfp_allowed_mask, to stop reclaim from going to swap: check that to prevent swap reuse too. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ondrej Zary <linux@rainbow-software.org> Cc: Andrea Gelmini <andrea.gelmini@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nigel Cunningham <nigel@tuxonice.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09swap: revert special hibernation allocationHugh Dickins5-84/+26
Please revert 2.6.36-rc commit d2997b1042ec150616c1963b5e5e919ffd0b0ebf "hibernation: freeze swap at hibernation". It complicated matters by adding a second swap allocation path, just for hibernation; without in any way fixing the issue that it was intended to address - page reclaim after fixing the hibernation image might free swap from a page already imaged as swapcache, letting its swap be reallocated to store a different page of the image: resulting in data corruption if the imaged page were freed as clean then swapped back in. Pages freed to si->swap_map were still in danger of being reallocated by the alternative allocation path. I guess it inadvertently fixed slow SSD swap allocation for hibernation, as reported by Nigel Cunningham: by missing out the discards that occur on the usual swap allocation path; but that was unintentional, and needs a separate fix. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ondrej Zary <linux@rainbow-software.org> Cc: Andrea Gelmini <andrea.gelmini@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nigel Cunningham <nigel@tuxonice.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09bounce: call flush_dcache_page() after bounce_copy_vec()Gary King1-1/+1
I have been seeing problems on Tegra 2 (ARMv7 SMP) systems with HIGHMEM enabled on 2.6.35 (plus some patches targetted at 2.6.36 to perform cache maintenance lazily), and the root cause appears to be that the mm bouncing code is calling flush_dcache_page before it copies the bounce buffer into the bio. The bounced page needs to be flushed after data is copied into it, to ensure that architecture implementations can synchronize instruction and data caches if necessary. Signed-off-by: Gary King <gking@nvidia.com> Cc: Tejun Heo <tj@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Acked-by: Jens Axboe <axboe@kernel.dk> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09drivers/rtc/rtc-pl031.c: do not mark PL031 IRQ as sharedLinus Walleij1-1/+1
It was a mistake to mark the PL031 IRQ as shared (for the U8500), we misread the datasheet. Get rid of this. Signed-off-by: Linus Walleij <linus.walleij@stericsson.com> Cc: Jonas Aberg <jonas.aberg@stericsson.com> Cc: Mian Yousaf Kaukab <mian.yousaf.kaukab@stericsson.com> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09MAINTAINERS: correct entry for legacy RTC-driverWolfram Sang1-4/+2
Because no one dared to remove it so far, let's keep the entry correct, at least. Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Acked-by: Mike Frysinger <vapier@gentoo.org> Cc: Paul Gortmaker <p_gortmaker@yahoo.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09gpio: doc updatesDavid Brownell2-9/+27
There's been some recent confusion about error checking GPIO numbers. briefly, it should be handled mostly during setup, when gpio_request() is called, and NEVER by expectig gpio_is_valid to report more than never-usable GPIO numbers. [akpm@linux-foundation.org: terminate unterminated comment] Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Eric Miao" <eric.y.miao@gmail.com> Cc: "Ryan Mallon" <ryan@bluewatersys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09gpio: sx150x: correct and refine reset-on-probe behaviorGregory Bean2-5/+25
Replace the arbitrary software-reset call from the device-probe method, because: - It is defective. To work correctly, it should be two byte writes, not a single word write. As it stands, it does nothing. - Some devices with sx150x expanders installed have their NRESET pins ganged on the same line, so resetting one causes the others to reset - not a nice thing to do arbitrarily! - The probe, usually taking place at boot, implies a recent hard-reset, so a software reset at this point is just a waste of energy anyway. Therefore, make it optional, defaulting to off, as this will match the common case of probing at powerup and also matches the current broken no-op behavior. Signed-off-by: Gregory Bean <gbean@codeaurora.org> Reviewed-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09memory hotplug: fix next block calculation in is_removableKAMEZAWA Hiroyuki1-8/+8
next_active_pageblock() is for finding next _used_ freeblock. It skips several blocks when it finds there are a chunk of free pages lager than pageblock. But it has 2 bugs. 1. We have no lock. page_order(page) - pageblock_order can be minus. 2. pageblocks_stride += is wrong. it should skip page_order(p) of pages. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09mm: compaction: handle active and inactive fairly in too_many_isolatedMinchan Kim1-3/+4
Iram reported that compaction's too_many_isolated() loops forever. (http://www.spinics.net/lists/linux-mm/msg08123.html) The meminfo when the situation happened was inactive anon is zero. That's because the system has no memory pressure until then. While all anon pages were in the active lru, compaction could select active lru as well as inactive lru. That's a different thing from vmscan's isolated. So we has been two too_many_isolated. While compaction can isolate pages in both active and inactive, current implementation of too_many_isolated only considers inactive. It made Iram's problem. This patch handles active and inactive fairly. That's because we can't expect where from and how many compaction would isolated pages. This patch changes (nr_isolated > nr_inactive) with nr_isolated > (nr_active + nr_inactive) / 2. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Reported-by: Iram Shahzad <iram.shahzad@jp.fujitsu.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09kernel/groups.c: fix integer overflow in groups_searchJerome Marchand1-3/+2
gid_t is a unsigned int. If group_info contains a gid greater than MAX_INT, groups_search() function may look on the wrong side of the search tree. This solves some unfair "permission denied" problems. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09arch/powerpc/include/asm/fsldma.h needs slab.hIra W. Snyder1-0/+1
The slab.h header is required to use the kmalloc() family of functions. Due to recent kernel changes, this header must be directly included by code that calls into the memory allocator. Without this patch, any code which includes this header fails to build. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>