summaryrefslogtreecommitdiffstats
path: root/init
AgeCommit message (Collapse)AuthorFilesLines
2008-05-14block: do_mounts - accept root=<non-existant partition>Kay Sievers1-1/+26
Some devices, like md, may create partitions only at first access, so allow root= to be set to a valid non-existant partition of an existing disk. This applies only to non-initramfs root mounting. This fixes a regression from 2.6.24 which did allow this to happen and broke some users machines :( Acked-by: Neil Brown <neilb@suse.de> Tested-by: Joao Luis Meloni Assirati <assirati@nonada.if.usp.br> Cc: stable <stable@kernel.org> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-13init: don't lose initcall return valuesCyrill Gorcunov1-9/+9
There is an ability to lose an initcall return value if it happened with irq disabled or imbalanced preemption (and if we debug initcall). Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-09module: don't ignore vermagic string if module doesn't have modversionsRusty Russell1-3/+3
Linus found a logic bug: we ignore the version number in a module's vermagic string if we have CONFIG_MODVERSIONS set, but modversions also lets through a module with no __versions section for modprobe --force (with tainting, but still). We should only ignore the start of the vermagic string if the module actually *has* crcs to check. Rather than (say) having an entertaining hissy fit and creating a config option to work around the buggy code. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-07pcspkr: fix dependanciesStas Sergeev1-0/+8
fix pcspkr dependancies: make the pcspkr platform drivers to depend on a platform device, and not the other way around. Signed-off-by: Stas Sergeev <stsp@aknet.ru> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Dmitry Torokhov <dtor@mail.ru> CC: Vojtech Pavlik <vojtech@suse.cz> CC: Michael Opdenacker <michael-lists@free-electrons.com> [fixed for 2.6.26-rc1 by tiwai] Signed-off-by: Takashi Iwai <tiwai@suse.de>
2008-05-05sched: default to n for GROUP_SCHED and FAIR_GROUP_SCHEDParag Warudkar1-2/+3
GROUP_SCHED is confirmed to cause unacceptable latencies, see: http://lkml.org/lkml/2008/5/2/370. Mark it EXPERIMENTAL and default to no for now. Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-05sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCKPeter Zijlstra1-0/+1
this replaces the rq->clock stuff (and possibly cpu_clock()). - architectures that have an 'imperfect' hardware clock can set CONFIG_HAVE_UNSTABLE_SCHED_CLOCK - the 'jiffie' window might be superfulous when we update tick_gtod before the __update_sched_clock() call in sched_clock_tick() - cpu_clock() might be implemented as: sched_clock_cpu(smp_processor_id()) if the accuracy proves good enough - how far can TSC drift in a single jiffie when considering the filtering and idle hooks? [ mingo@elte.hu: various fixes and cleanups ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-05sched, x86: add HAVE_UNSTABLE_SCHED_CLOCKIngo Molnar1-0/+6
add the HAVE_UNSTABLE_SCHED_CLOCK, for architectures to select. the next change utilizes it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-04Make forced module loading optionalLinus Torvalds1-0/+9
The kernel module loader used to be much too happy to allow loading of modules for the wrong kernel version by default. For example, if you had MODVERSIONS enabled, but tried to load a module with no version info, it would happily load it and taint the kernel - whether it was likely to actually work or not! Generally, such forced module loading should be considered a really really bad idea, so make it conditional on a new config option (MODULE_FORCE_LOAD), and make it default to off. If somebody really wants to force module loads, that's their problem, but we should not encourage it. Especially as it happened to me by mistake (ie regular unversioned Fedora modules getting loaded) causing lots of strange behavior. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-02slub: #ifdef simplificationChristoph Lameter1-1/+1
If we make SLUB_DEBUG depend on SYSFS then we can simplify some #ifdefs and avoid others. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-04-30infrastructure to debug (dynamic) objectsThomas Gleixner1-0/+3
We can see an ever repeating problem pattern with objects of any kind in the kernel: 1) freeing of active objects 2) reinitialization of active objects Both problems can be hard to debug because the crash happens at a point where we have no chance to decode the root cause anymore. One problem spot are kernel timers, where the detection of the problem often happens in interrupt context and usually causes the machine to panic. While working on a timer related bug report I had to hack specialized code into the timer subsystem to get a reasonable hint for the root cause. This debug hack was fine for temporary use, but far from a mergeable solution due to the intrusiveness into the timer code. The code further lacked the ability to detect and report the root cause instantly and keep the system operational. Keeping the system operational is important to get hold of the debug information without special debugging aids like serial consoles and special knowledge of the bug reporter. The problems described above are not restricted to timers, but timers tend to expose it usually in a full system crash. Other objects are less explosive, but the symptoms caused by such mistakes can be even harder to debug. Instead of creating specialized debugging code for the timer subsystem a generic infrastructure is created which allows developers to verify their code and provides an easy to enable debug facility for users in case of trouble. The debugobjects core code keeps track of operations on static and dynamic objects by inserting them into a hashed list and sanity checking them on object operations and provides additional checks whenever kernel memory is freed. The tracked object operations are: - initializing an object - adding an object to a subsystem list - deleting an object from a subsystem list Each operation is sanity checked before the operation is executed and the subsystem specific code can provide a fixup function which allows to prevent the damage of the operation. When the sanity check triggers a warning message and a stack trace is printed. The list of operations can be extended if the need arises. For now it's limited to the requirements of the first user (timers). The core code enqueues the objects into hash buckets. The hash index is generated from the address of the object to simplify the lookup for the check on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a global lock. The debug code can be compiled in without being active. The runtime overhead is minimal and could be optimized by asm alternatives. A kernel command line option enables the debugging code. Thanks to Ingo Molnar for review, suggestions and cleanup patches. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30Deprecate find_task_by_pid()Pavel Emelyanov1-1/+1
There are some places that are known to operate on tasks' global pids only: * the rest_init() call (called on boot) * the kgdb's getthread * the create_kthread() (since the kthread is run in init ns) So use the find_task_by_pid_ns(..., &init_pid_ns) there and schedule the find_task_by_pid for removal. [sukadev@us.ibm.com: Fix warning in kernel/pid.c] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30signals: fix /sbin/init protection from unwanted signalsOleg Nesterov1-0/+2
The global init has a lot of long standing problems with the unhandled fatal signals. - The "is_global_init(current)" check in get_signal_to_deliver() protects only the main thread. Sub-thread can dequee the fatal signal and shutdown the whole thread group except the main thread. If it dequeues SIGSTOP /sbin/init will be stopped, this is not right too. Note that we can't use is_global_init(->group_leader), this breaks exec and this can't solve other problems we have. - Even if afterwards ignored, the fatal signals sets SIGNAL_GROUP_EXIT on delivery. This breaks exec, has other bad implications, and this is just wrong. Introduce the new SIGNAL_UNKILLABLE flag to fix these problems. It also helps to solve some other problems addressed by the subsequent patches. Currently we use this flag for the global init only, but it could also be used by kthreads and (perhaps) by the sub-namespace inits. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29idr: create idr_layer_cache at boot timeAkinobu Mita1-0/+2
Avoid a possible kmem_cache_create() failure by creating idr_layer_cache unconditionary at boot time rather than creating it on-demand when idr_init() is called the first time. This change also enables us to eliminate the check every time idr_init() is called. [akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()] [akpm@linux-foundation.org: fix alpha build] Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29sysctl: allow embedded targets to disable sysctl_check.cHolger Schurig1-0/+11
Disable sysctl_check.c for embedded targets. This saves about about 11 kB in .text and another 11 kB in .data on a PXA255 embedded platform. Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29cgroups: add an owner to the mm_structBalbir Singh2-0/+8
Remove the mem_cgroup member from mm_struct and instead adds an owner. This approach was suggested by Paul Menage. The advantage of this approach is that, once the mm->owner is known, using the subsystem id, the cgroup can be determined. It also allows several control groups that are virtually grouped by mm_struct, to exist independent of the memory controller i.e., without adding mem_cgroup's for each controller, to mm_struct. A new config option CONFIG_MM_OWNER is added and the memory resource controller selects this config option. This patch also adds cgroup callbacks to notify subsystems when mm->owner changes. The mm_cgroup_changed callback is called with the task_lock() of the new task held and is called just prior to changing the mm->owner. I am indebted to Paul Menage for the several reviews of this patchset and helping me make it lighter and simpler. This patch was tested on a powerpc box, it was compiled with both the MM_OWNER config turned on and off. After the thread group leader exits, it's moved to init_css_state by cgroup_exit(), thus all future charges from runnings threads would be redirected to the init_css_set's subsystem. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Hugh Dickins <hugh@veritas.com> Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: Hirokazu Takahashi <taka@valinux.co.jp> Cc: David Rientjes <rientjes@google.com>, Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: Paul Menage <menage@google.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29cgroups: implement device whitelistSerge E. Hallyn1-0/+7
Implement a cgroup to track and enforce open and mknod restrictions on device files. A device cgroup associates a device access whitelist with each cgroup. A whitelist entry has 4 fields. 'type' is a (all), c (char), or b (block). 'all' means it applies to all types and all major and minor numbers. Major and minor are either an integer or * for all. Access is a composition of r (read), w (write), and m (mknod). The root device cgroup starts with rwm to 'all'. A child devcg gets a copy of the parent. Admins can then remove devices from the whitelist or add new entries. A child cgroup can never receive a device access which is denied its parent. However when a device access is removed from a parent it will not also be removed from the child(ren). An entry is added using devices.allow, and removed using devices.deny. For instance echo 'c 1:3 mr' > /cgroups/1/devices.allow allows cgroup 1 to read and mknod the device usually known as /dev/null. Doing echo a > /cgroups/1/devices.deny will remove the default 'a *:* mrw' entry. CAP_SYS_ADMIN is needed to change permissions or move another task to a new cgroup. A cgroup may not be granted more permissions than the cgroup's parent has. Any task can move itself between cgroups. This won't be sufficient, but we can decide the best way to adequately restrict movement later. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix may-be-used-uninitialized warning] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: James Morris <jmorris@namei.org> Looks-good-to: Pavel Emelyanov <xemul@openvz.org> Cc: Daniel Hokka Zakrisson <daniel@hozac.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29CGroup API files: make CGROUP_DEBUG default to offPaul Menage1-0/+1
The cgroup debug subsystem isn't generally useful for users. It should default to "n". Signed-off-by: Paul Menage <menage@google.com> Cc: "Li Zefan" <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29directly use kmalloc() and kfree() in init/initramfs.cThomas Petazzoni1-10/+10
Instead of using the malloc() and free() wrappers needed by the lib/inflate.c code for allocations, simply use kmalloc() and kfree() in the initramfs code. This is needed for a further lib/inflate.c-related cleanup patch that will remove the malloc() and free() functions. Take that opportunity to remove the useless kmalloc() return value cast. Based on work done by Matt Mackall. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29Simplify initcall_debug outputBjorn Helgaas1-14/+6
print_fn_descriptor_symbol() prints the address if we don't have a symbol, so no need to print both. Also, combine printing return value with elapsed time. Changes this: Calling initcall 0xc05b7a70: pci_mmcfg_late_insert_resources+0x0/0x50() initcall 0xc05b7a70: pci_mmcfg_late_insert_resources+0x0/0x50() returned 1. initcall 0xc05b7a70 ran for 0 msecs: pci_mmcfg_late_insert_resources+0x0/0x50() initcall at 0xc05b7a70: pci_mmcfg_late_insert_resources+0x0/0x50(): returned with error code 1 to this: calling pci_mmcfg_late_insert_resources+0x0/0x50() initcall pci_mmcfg_late_insert_resources+0x0/0x50() returned 1 after 0 msecs initcall pci_mmcfg_late_insert_resources+0x0/0x50() returned with error code 1 Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29let LOG_BUF_SHIFT default to 17Adrian Bunk1-9/+6
16 kB is often no longer enough for a normal boot of an UP system. And even less when people e.g. use suspend. 17 seems to be a more reasonable default for current kernels on current hardware (it's just the default, anyone who is memory limited can still lower it). Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28init: fix integer as NULL pointer warningsHarvey Harrison2-2/+2
init/do_mounts_rd.c:215:13: warning: Using plain integer as NULL pointer init/do_mounts_md.c:136:45: warning: Using plain integer as NULL pointer Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds1-1/+1
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: sparc: video drivers: add facility level sparc: tcx.c make tcx_init and tcx_exit static sparc: ffb.c make ffb_init and ffb_exit static sparc: cg14.c make cg14_init and cg15_exit static sparc: bw2.c fix bw2_exit sparc64: Fix accidental syscall restart on child return from clone/fork/vfork. sparc64: Clean up handling of pt_regs trap type encoding. sparc: Remove old style signal frame support. sparc64: Kill bogus RT_ALIGNEDSZ macro from signal.c sparc: sunzilog.c remove unused argument sparc: fix drivers/video/tcx.c warning sparc64: Kill unused local ISA bus layer. input: Rewrite sparcspkr device probing. sparc64: Do not ignore 'pmu' device ranges. sparc64: Kill ISA_FLOPPY_WORKS code. sparc64: Kill CONFIG_SPARC32_COMPAT sparc64: Cleanups and corrections for arch/sparc64/Kconfig sparc64: Fix wedged irq regression.
2008-04-28make CC_OPTIMIZE_FOR_SIZE non-experimentalIngo Molnar1-5/+1
this option has been the default on a wide range of distributions for a long time - time to make it non-experimental. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-26sparc64: Kill CONFIG_SPARC32_COMPATDavid S. Miller1-1/+1
It's completely superfluous, CONFIG_COMPAT is sufficient. What this used to be is an umbrella for enabling code shared by all 32-bit compat binary support types. But with the removal of SunOS and Solaris support, the only one left is Linux 32-bit ELF. Update defconfig. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-24[POWERPC] Use __weak macro for smp_setup_processor_idBenjamin Herrenschmidt1-1/+1
Use the __weak macro instead of the longer __attribute__ ((weak)) form in one place in init/main.c. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> -- init/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-24[POWERPC] Add thread_info_cache_init() weak hookBenjamin Herrenschmidt1-0/+5
Some architectures need to maintain a kmem cache for thread info structures. The next commit adds that to powerpc to fix an alignment problem. There is no good arch callback to use to initialize that cache that I can find, so this adds a new one in the form of a weak function whose default is empty. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-19sched: better rt-group documentationViktor Radnai1-0/+7
Viktor was nice enough to enhance the document based on my replies to his questions on the subject. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19init: move setup of nr_cpu_ids to as early as possibleMike Travis1-0/+17
Move the setting of nr_cpu_ids from sched_init() to start_kernel() so that it's available as early as possible. Note that an arch has the option of setting it even earlier if need be, but it should not result in a different value than the setup_nr_cpu_ids() function. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19cpumask: add CPU_MASK_ALL_PTR macroMike Travis1-1/+6
* Add a static cpumask_t variable "CPU_MASK_ALL_PTR" to use as a pointer reference to CPU_MASK_ALL. This reduces where possible the instances where CPU_MASK_ALL allocates and fills a large array on the stack. Used only if NR_CPUS > BITS_PER_LONG. * Change init/main.c to use new set_cpus_allowed_ptr(). Depends on: [sched-devel]: sched: add new set_cpus_allowed_ptr function Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-14slub: No need for per node slab counters if !SLUB_DEBUGChristoph Lameter1-1/+1
The per node counters are used mainly for showing data through the sysfs API. If that API is not compiled in then there is no point in keeping track of this data. Disable counters for the number of slabs and the number of total slabs if !SLUB_DEBUG. Incrementing the per node counters is also accessing a potentially contended cacheline so this could actually be a performance benefit to embedded systems. SLABINFO support is also affected. It now must depends on SLUB_DEBUG (which is on by default). Patch also avoids a check for a NULL kmem_cache_node pointer in new_slab() if the system is not compiled with NUMA support. [penberg@cs.helsinki.fi: fix oops and move ->nr_slabs into CONFIG_SLUB_DEBUG] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-03-15ACPI: Remove ACPI_CUSTOM_DSDT_INITRD optionLinus Torvalds2-14/+1
This essentially reverts commit 71fc47a9adf8ee89e5c96a47222915c5485ac437 ("ACPI: basic initramfs DSDT override support"), because the code simply isn't ready. It did ugly things to the init sequence to populate the rootfs image early, but that just ended up showing other problems with the whole approach. The fact is, the VFS layer simply isn't initialized this early, and the relevant ACPI code should either run much later, or this shouldn't be done at all. For 2.6.25, we'll just pick the latter option. We can revisit this concept later if necessary. Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Tilman Schmidt <tilman@imap.cc> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Renninger <trenn@suse.de> Cc: Eric Piel <eric.piel@tremplin-utc.net> Cc: Len Brown <len.brown@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Markus Gaugusch <dsdt@gaugusch.at> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-10rcu: move PREEMPT_RCU config option back under PREEMPTPaul E. McKenney1-31/+3
The original preemptible-RCU patch put the choice between classic and preemptible RCU into kernel/Kconfig.preempt, which resulted in build failures on machines not supporting CONFIG_PREEMPT. This choice was therefore moved to init/Kconfig, which worked, but placed the choice between classic and preemptible RCU at the top level, a very obtuse choice indeed. This patch changes from the Kconfig "choice" mechanism to a pair of booleans, only one of which (CONFIG_PREEMPT_RCU) is user-visible, and is located in kernel/Kconfig.preempt, where one would expect it to be. The other (CONFIG_CLASSIC_RCU) is in init/Kconfig so that it is available to all architectures, hopefully avoiding build breakage. Thanks to Roman Zippel for suggesting this approach. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Josh Triplett <josh@freedesktop.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds1-3/+8
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: debugfs: fix sparse warnings Driver core: Fix cleanup when failing device_add(). driver core: Remove dpm_sysfs_remove() from error path of device_add() PM: fix new mutex-locking bug in the PM core PM: Do not acquire device semaphores upfront during suspend kobject: properly initialize ksets sysfs: CONFIG_SYSFS_DEPRECATED fix driver core: fix up Kconfig text for CONFIG_SYSFS_DEPRECATED
2008-03-04Memory controller: rename to Memory Resource ControllerBalbir Singh1-15/+15
Rename Memory Controller to Memory Resource Controller. Reflect the same changes in the CONFIG definition for the Memory Resource Controller. Group together the config options for Resource Counters and Memory Resource Controller. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04Fix "Malformed early option 'loglevel'"Alex Riesen1-1/+1
Keith Mannthey said: The parameter hotadd_percent is setup right but there is a "Malformed early option 'numa'" message. Rusty Russell said: This happens when the function registered with early_param() returns non-zero. __setup() functions return 1 if OK, module_param() and early_param() return 0 or a -ve error code. For instance: Linux version 2.6.25-rc3-t (raa@steel) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #22 SMP PREEMPT Tue Feb 26 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003fff0000 (usable) BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS) BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data) BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) Malformed early option 'loglevel' 127MB HIGHMEM available. 896MB LOWMEM available. Command line: BOOT_IMAGE=2.6.25-t ro root=809 ro console=ttyS0,57600n8 console=tty0 loglevel=5 Acked-by: Yinghai Lu <yhlu.kernel@gmai.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Keith Mannthey <kmannth@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04sysfs: CONFIG_SYSFS_DEPRECATED fixIngo Molnar1-0/+4
CONFIG_SYSFS_DEPRECATED=y changed its meaning recently and causes regressions in working setups that had SYSFS_DEPRECATED disabled. so rename it to SYSFS_DEPRECATED_V2 so that testers pick up the new default via 'make oldconfig', even if their old .config's disabled CONFIG_SYSFS_DEPRECATED ... Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-03-04driver core: fix up Kconfig text for CONFIG_SYSFS_DEPRECATEDGreg Kroah-Hartman1-3/+4
As things get moved into this config option, the hard date of 2006 does not work anymore, so update the text to be more descriptive. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-23cgroup memory controller: document huge memory/cache overhead in KconfigAndi Kleen1-0/+8
Document huge memory/cache overhead of memory controller in Kconfig I was a little surprised that 2.6.25-rc* increased struct page for the memory controller. At least on many x86-64 machines it will not fit into a single cache line now anymore and also costs considerable amounts of RAM. At earlier review I remembered asking for a external data structure for this. It's also quite unobvious that a innocent looking Kconfig option with a single line Kconfig description has such a negative effect. This patch attempts to document these disadvantages at least so that users configuring their kernel can make a informed decision. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14Use struct path in fs_structJan Blunck1-3/+3
* Use struct path in fs_struct. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13sched: rt-group: make rt groups scheduling configurablePeter Zijlstra1-6/+17
Make the rt group scheduler compile time configurable. Keep it experimental for now. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-11kbuild: fix make V=1Sam Ravnborg1-0/+1
When make -s support were added to filechk to combination created with make V=1 were not covered. Fix it by explicitly cover this case too. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Mike Frysinger <vapier@gentoo.org>
2008-02-09x86: DEBUG_PAGEALLOC: enable after mem_init()Thomas Gleixner1-1/+1
DEBUG_PAGEALLOC must not be enabled before mem_init(). Before this point there is nothing to allocate. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-09brk: help text typo fixIngo Molnar1-1/+1
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-09kbuild: silence CHK/UPD messages according to $(quiet)Mike Frysinger1-1/+3
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-02-08Convert loglevel-related kernel boot parameters to early_paramYinghai Lu1-9/+5
So we can use them for the early console like console=uart8250 or earlycon=uart8250 or early_printk Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08start the global /sbin/init with 0,0 special pidsOleg Nesterov1-1/+0
As Eric pointed out, there is no problem with init starting with sid == pgid == 0, and this was historical linux behavior changed in 2.6.18. Remove kernel_init()->__set_special_pids(), this is unneeded and complicates the rules for sys_setsid(). This change and the previous change in daemonize() mean that /sbin/init does not need the special "session != 1" hack in sys_setsid() any longer. We can't remove this check yet, we should cleanup copy_process(CLONE_NEWPID) first, so update the comment only. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08teach set_special_pids() to use struct pidOleg Nesterov1-1/+1
Change set_special_pids() to work with struct pid, not pid_t from global name space. This again speedups and imho cleanups the code, also a preparation for the next patch. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08namespaces: cleanup the code managed with PID_NS optionPavel Emelyanov1-12/+12
Just like with the user namespaces, move the namespace management code into the separate .c file and mark the (already existing) PID_NS option as "depend on NAMESPACES" [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08namespaces: cleanup the code managed with the USER_NS optionPavel Emelyanov1-9/+8
Make the user_namespace.o compilation depend on this option and move the init_user_ns into user.c file to make the kernel compile and work without the namespaces support. This make the user namespace code be organized similar to other namespaces'. Also mask the USER_NS option as "depend on NAMESPACES". [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08namespaces: move the IPC namespace under IPC_NS optionPavel Emelyanov1-0/+7
Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>