summaryrefslogtreecommitdiffstats
path: root/init/Kconfig
AgeCommit message (Collapse)AuthorFilesLines
2013-02-25Merge branch 'for-linus' of ↵Linus Torvalds1-13/+7
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace and namespace infrastructure changes from Eric W Biederman: "This set of changes starts with a few small enhnacements to the user namespace. reboot support, allowing more arbitrary mappings, and support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the user namespace root. I do my best to document that if you care about limiting your unprivileged users that when you have the user namespace support enabled you will need to enable memory control groups. There is a minor bug fix to prevent overflowing the stack if someone creates way too many user namespaces. The bulk of the changes are a continuation of the kuid/kgid push down work through the filesystems. These changes make using uids and gids typesafe which ensures that these filesystems are safe to use when multiple user namespaces are in use. The filesystems converted for 3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The changes for these filesystems were a little more involved so I split the changes into smaller hopefully obviously correct changes. XFS is the only filesystem that remains. I was hoping I could get that in this release so that user namespace support would be enabled with an allyesconfig or an allmodconfig but it looks like the xfs changes need another couple of days before it they are ready." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits) cifs: Enable building with user namespaces enabled. cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t cifs: Convert struct cifs_sb_info to use kuids and kgids cifs: Modify struct smb_vol to use kuids and kgids cifs: Convert struct cifsFileInfo to use a kuid cifs: Convert struct cifs_fattr to use kuid and kgids cifs: Convert struct tcon_link to use a kuid. cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t cifs: Convert from a kuid before printing current_fsuid cifs: Use kuids and kgids SID to uid/gid mapping cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc cifs: Use BUILD_BUG_ON to validate uids and gids are the same size cifs: Override unmappable incoming uids and gids nfsd: Enable building with user namespaces enabled. nfsd: Properly compare and initialize kuids and kgids nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids nfsd: Modify nfsd4_cb_sec to use kuids and kgids nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion nfsd: Convert nfsxdr to use kuids and kgids nfsd: Convert nfs3xdr to use kuids and kgids ...
2013-02-25Merge tag 'modules-next-for-linus' of ↵Linus Torvalds1-0/+20
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module update from Rusty Russell: "The sweeping change is to make add_taint() explicitly indicate whether to disable lockdep, but it's a mechanical change." * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: MODSIGN: Add option to not sign modules during modules_install MODSIGN: Add -s <signature> option to sign-file MODSIGN: Specify the hash algorithm on sign-file command line MODSIGN: Simplify Makefile with a Kconfig helper module: clean up load_module a little more. modpost: Ignore ARC specific non-alloc sections module: constify within_module_* taint: add explicit flag to show whether lock dep is still OK. module: printk message when module signature fail taints kernel.
2013-02-21Merge tag 'please-pull-misc-3.9' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull misc ia64 bits from Tony Luck. * tag 'please-pull-misc-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: MAINTAINERS: update SGI & ia64 Altix stuff sysctl: Enable IA64 "ignore-unaligned-usertrap" to be used cross-arch
2013-02-21Merge tag 'driver-core-3.9-rc1' of ↵Linus Torvalds1-42/+12
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg Kroah-Hartman: "Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL Other than those patches, there's not much here, some minor fixes and updates" Fix up trivial conflicts * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits) base: memory: fix soft/hard_offline_page permissions drivercore: Fix ordering between deferred_probe and exiting initcalls backlight: fix class_find_device() arguments TTY: mark tty_get_device call with the proper const values driver-core: constify data for class_find_device() firmware: Ignore abort check when no user-helper is used firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER firmware: Make user-mode helper optional firmware: Refactoring for splitting user-mode helper code Driver core: treat unregistered bus_types as having no devices watchdog: Convert to devm_ioremap_resource() thermal: Convert to devm_ioremap_resource() spi: Convert to devm_ioremap_resource() power: Convert to devm_ioremap_resource() mtd: Convert to devm_ioremap_resource() mmc: Convert to devm_ioremap_resource() mfd: Convert to devm_ioremap_resource() media: Convert to devm_ioremap_resource() iommu: Convert to devm_ioremap_resource() drm: Convert to devm_ioremap_resource() ...
2013-02-19Merge branch 'sched-core-for-linus' of ↵Linus Torvalds1-2/+23
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "Main changes: - scheduler side full-dynticks (user-space execution is undisturbed and receives no timer IRQs) preparation changes that convert the cputime accounting code to be full-dynticks ready, from Frederic Weisbecker. - Initial sched.h split-up changes, by Clark Williams - select_idle_sibling() performance improvement by Mike Galbraith: " 1 tbench pair (worst case) in a 10 core + SMT package: pre 15.22 MB/sec 1 procs post 252.01 MB/sec 1 procs " - sched_rr_get_interval() ABI fix/change. We think this detail is not used by apps (so it's not an ABI in practice), but lets keep it under observation. - misc RT scheduling cleanups, optimizations" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) sched/rt: Add <linux/sched/rt.h> header to <linux/init_task.h> cputime: Remove irqsave from seqlock readers sched, powerpc: Fix sched.h split-up build failure cputime: Restore CPU_ACCOUNTING config defaults for PPC64 sched/rt: Move rt specific bits into new header file sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice sched: Move sched.h sysctl bits into separate header sched: Fix signedness bug in yield_to() sched: Fix select_idle_sibling() bouncing cow syndrome sched/rt: Further simplify pick_rt_task() sched/rt: Do not account zero delta_exec in update_curr_rt() cputime: Safely read cputime of full dynticks CPUs kvm: Prepare to add generic guest entry/exit callbacks cputime: Use accessors to read task cputime stats cputime: Allow dynamic switch between tick/virtual based cputime accounting cputime: Generic on-demand virtual cputime accounting cputime: Move default nsecs_to_cputime() to jiffies based cputime file cputime: Librarize per nsecs resolution cputime definitions cputime: Avoid multiplication overflow on utime scaling context_tracking: Export context state for generic vtime ... Fix up conflict in kernel/context_tracking.c due to comment additions.
2013-02-19Merge branch 'irq-core-for-linus' of ↵Linus Torvalds1-4/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq core changes from Ingo Molnar: "The biggest changes are the IRQ-work and printk changes from Frederic Weisbecker, which prepare the code for 'full dynticks' (the ability to stop or slow down the periodic tick arbitrarily, not just in idle time as today): - Don't stop tick with irq works pending. This fix is generally useful and concerns archs that can't raise self IPIs. - Flush irq works before CPU offlining. - Introduce "lazy" irq works that can wait for the next tick to be executed, unless it's stopped. - Implement klogd wake up using irq work. This removes the ad-hoc printk_tick()/printk_needs_cpu() hooks and make it working even in dynticks mode. - Cleanups and fixes." * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Export enable/disable_percpu_irq() arch Kconfig: Remove references to IRQ_PER_CPU irq_work: Remove return value from the irq_work_queue() function genirq: Avoid deadlock in spurious handling printk: Wake up klogd using irq_work irq_work: Make self-IPIs optable irq_work: Warn if there's still work on cpu_down irq_work: Flush work on CPU_DYING irq_work: Don't stop the tick with pending works nohz: Add API to check tick state irq_work: Remove CONFIG_HAVE_IRQ_WORK irq_work: Fix racy check on work pending flag irq_work: Fix racy IRQ_WORK_BUSY flag setting
2013-02-13cifs: Enable building with user namespaces enabled.Eric W. Biederman1-1/+0
Cc: Steve French <smfrench@gmail.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13nfsd: Enable building with user namespaces enabled.Eric W. Biederman1-1/+0
Now that the kuids and kgids conversion have propogated through net/sunrpc/ and the fs/nfsd/ it is safe to enable building nfsd when user namespaces are enabled. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13nfs: Enable building with user namespaces enabled.Eric W. Biederman1-1/+0
Now that the kuids and kgids conversion have propogated through net/sunrpc/ and the fs/nfs/ it is safe to enable building nfs when user namespaces are enabled. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13ncpfs: Support interacting with multiple user namespacesEric W. Biederman1-1/+0
ncpfs does not natively support uids and gids so this conversion was simply a matter of updating the the type of the mounteduid, the uid and the gid on the superblock. Fixing the ioctls that read them, updating the mount option parser and the mount option printer. Cc: Petr Vandrovec <petr@vandrovec.name> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2013-02-13gfs2: Enable building with user namespaces enabledEric W. Biederman1-1/+0
Now that all of the necessary work has been done to push kuids and kgids throughout gfs2 and to convert between kuids and kgids when reading and writing the on disk structures it is safe to enable gfs2 when multiple user namespaces are enabled. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13ocfs2: Enable building with user namespaces enabledEric W. Biederman1-1/+0
Now that ocfs2 has been converted to store uids and gids in kuid_t and kgid_t and all of the conversions have been added to the appropriate places it is safe to allow building and using ocfs2 with user namespace support enabled. Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13coda: Allow coda to be built when user namespace support is enabledEric W. Biederman1-1/+0
Now that the coda kernel to userspace has been modified to convert between kuids and kgids and uids and gids, and all internal coda structures have be modified to store uids and gids as kuids and kgids it is safe to allow code to be built with user namespace support enabled. Cc: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13afs: Support interacting with multiple user namespacesEric W. Biederman1-1/+0
Modify struct afs_file_status to store owner as a kuid_t and group as a kgid_t. In xdr_decode_AFSFetchStatus as owner is now a kuid_t and group is now a kgid_t don't use the EXTRACT macro. Instead perform the work of the extract macro explicitly. Read the value with ntohl and convert it to the appropriate type with make_kuid or make_kgid. Test if the value is different from what is stored in status and update changed. Update the value in status. In xdr_encode_AFS_StoreStatus call from_kuid or from_kgid as we are computing the on the wire encoding. Initialize uids with GLOBAL_ROOT_UID instead of 0. Initialize gids with GLOBAL_ROOT_GID instead of 0. Cc: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2013-02-129p: Allow building 9p with user namespaces enabled.Eric W. Biederman1-4/+0
Now that the uid_t -> kuid_t, gid_t -> kgid_t conversion has been completed in 9p allow 9p to be built when user namespaces are enabled. Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-12ceph: Enable building when user namespaces are enabled.Eric W. Biederman1-1/+0
Now that conversions happen from kuids and kgids when generating ceph messages and conversion happen to kuids and kgids after receiving celph messages, and all intermediate data structures store uids and gids as type kuid_t and kgid_t it is safe to enable ceph with user namespace support enabled. Cc: Sage Weil <sage@inktank.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-08cputime: Restore CPU_ACCOUNTING config defaults for PPC64Stephen Rothwell1-1/+1
Commit abf917cd91cb ("cputime: Generic on-demand virtual cputime accounting") inadvertantly changed the default CPU_ACCOUNTING config for PPC64. Repair that. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: ppc-dev <linuxppc-dev@lists.ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: http://lkml.kernel.org/r/20130208141938.f31b7b9e1acac5bbe769ee4c@canb.auug.org.au Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-05Merge branch 'nohz/printk-v8' into irq/coreFrederic Weisbecker1-0/+1
Conflicts: kernel/irq_work.c Add support for printk in full dynticks CPU. * Don't stop tick with irq works pending. This fix is generally useful and concerns archs that can't raise self IPIs. * Flush irq works before CPU offlining. * Introduce "lazy" irq works that can wait for the next tick to be executed, unless it's stopped. * Implement klogd wake up using irq work. This removes the ad-hoc printk_tick()/printk_needs_cpu() hooks and make it working even in dynticks mode. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-02-04Merge branch 'rcu/next' of ↵Ingo Molnar1-1/+11
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: 1. Changes to rcutorture and to RCU documentation. Posted to LKML at https://lkml.org/lkml/2013/1/26/188. 2. Enhancements to uniprocessor handling in tiny RCU. Posted to LKML at https://lkml.org/lkml/2013/1/27/2. 3. Tag RCU callbacks with grace-period number to simplify callback advancement. Posted to LKML at https://lkml.org/lkml/2013/1/26/203. 4. Miscellaneous fixes. Posted to LKML at https://lkml.org/lkml/2013/1/26/204. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-28rcu: Allow TREE_PREEMPT_RCU on UP systemsPaul E. McKenney1-1/+3
The TINY_PREEMPT_RCU is complex, does not provide that much memory savings, and therefore TREE_PREEMPT_RCU should be used instead. The systems where the difference between TINY_PREEMPT_RCU and TREE_PREEMPT_RCU are quite small compared to the memory footprint of CONFIG_PREEMPT. This commit therefore takes a first step towards eliminating TINY_PREEMPT_RCU by allowing TREE_PREEMPT_RCU to be configured on !SMP systems. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-01-28rcu: Provide RCU CPU stall warnings for tiny RCUPaul E. McKenney1-0/+8
Tiny RCU has historically omitted RCU CPU stall warnings in order to reduce memory requirements, however, lack of these warnings caused Thomas Gleixner some debugging pain recently. Therefore, this commit adds RCU CPU stall warnings to tiny RCU if RCU_TRACE=y. This keeps the memory footprint small, while still enabling CPU stall warnings in kernels built to enable them. Updated to include Josh Triplett's suggested use of RCU_STALL_COMMON config variable to simplify #if expressions. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-01-27cputime: Generic on-demand virtual cputime accountingFrederic Weisbecker1-1/+22
If we want to stop the tick further idle, we need to be able to account the cputime without using the tick. Virtual based cputime accounting solves that problem by hooking into kernel/user boundaries. However implementing CONFIG_VIRT_CPU_ACCOUNTING require low level hooks and involves more overhead. But we already have a generic context tracking subsystem that is required for RCU needs by archs which plan to shut down the tick outside idle. This patch implements a generic virtual based cputime accounting that relies on these generic kernel/user hooks. There are some upsides of doing this: - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING if context tracking is already built (already necessary for RCU in full tickless mode). - We can rely on the generic context tracking subsystem to dynamically (de)activate the hooks, so that we can switch anytime between virtual and tick based accounting. This way we don't have the overhead of the virtual accounting when the tick is running periodically. And one downside: - There is probably more overhead than a native virtual based cputime accounting. But this relies on hooks that are already set anyway. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-01-26userns: Recommend use of memory control groups.Eric W. Biederman1-0/+7
In the help text describing user namespaces recommend use of memory control groups. In many cases memory control groups are the only mechanism there is to limit how much memory a user who can create user namespaces can use. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2013-01-25MODSIGN: Add option to not sign modules during modules_installMichal Marek1-0/+11
To allow the builder to sign only a subset of modules, or to sign the modules using a key that is not available on the build machine, add CONFIG_MODULE_SIG_ALL. If this option is unset, no modules will be signed during build. The default is 'y', to preserve the current behavior. Signed-off-by: Michal Marek <mmarek@suse.cz> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-01-25MODSIGN: Simplify Makefile with a Kconfig helperMichal Marek1-0/+9
Signed-off-by: Michal Marek <mmarek@suse.cz> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-01-24Merge branch 'core/irq_work' of ↵Ingo Molnar1-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into irq/core irq_work fixes and cleanups, in preparation for full dyntics support. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-17Merge 3.9-rc4 into driver-core-nextGreg Kroah-Hartman1-1/+1
This is to fix up a build problem with a wireless driver due to the dynamic-debug patches in this branch. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-16Tell the world we gave up on pushing CC_OPTIMIZE_FOR_SIZEKirill Smelkov1-1/+1
In commit 281dc5c5ec0f ("Give up on pushing CC_OPTIMIZE_FOR_SIZE") we already changed the actual default value, but the help-text still suggested 'y'. Fix the help text too, for all the same reasons. Sadly, -Os keeps on generating some very suboptimal code for certain cases, to the point where any I$ miss upside is swamped by the downside. The main ones are: - using "rep movsb" for memcpy, even on CPU's where that is horrendously bad for performance. - not honoring branch prediction information, so any I$ footprint you win from smaller code, you lose from less code density in the I$. - using divide instructions when that is very expensive. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11init: remove depends on CONFIG_EXPERIMENTALKees Cook1-13/+10
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: "Eric W. Biederman" <ebiederm@xmission.com> CC: Serge Hallyn <serge.hallyn@canonical.com> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2013-01-11make CONFIG_EXPERIMENTAL invisible and defaultKees Cook1-29/+2
This config item has not carried much meaning for a while now and is almost always enabled by default (especially in distro builds). As agreed during the Linux kernel summit, it should be removed. As a first step, remove it from being listed, and default it to on. Once it has been removed from all subsystem Kconfigs, it will be dropped entirely. For items that really are experimental, maintainers should use "default n", optionally include "(EXPERIMENTAL)" in the title, and add language to the help text indicating why the item should be considered experimental. For items that are dangerously experimental, the maintainer is encouraged to follow the above title recommendation, add stronger language to the help text, and optionally use (depending on the extent of the danger, from least to most dangerous): printk(), add_taint(TAINT_WARN), add_taint(TAINT_CRAP), WARN_ON(1), and CONFIG_BROKEN. CC: Greg KH <gregkh@linuxfoundation.org> CC: "Eric W. Biederman" <ebiederm@xmission.com> CC: Serge Hallyn <serge.hallyn@canonical.com> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-01-09sysctl: Enable IA64 "ignore-unaligned-usertrap" to be used cross-archVineet Gupta1-0/+7
IA64 defines /proc/sys/kernel/ignore-unaligned-usertrap to control verbose warnings on unaligned access emulation. Although the exact mechanics of what to do with sysctl (ignore/shout) are arch specific, this change enables the sysctl to be usable cross-arch. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-12-18memcg: infrastructure to match an allocation to the right cacheGlauber Costa1-1/+0
The page allocator is able to bind a page to a memcg when it is allocated. But for the caches, we'd like to have as many objects as possible in a page belonging to the same cache. This is done in this patch by calling memcg_kmem_get_cache in the beginning of every allocation function. This function is patched out by static branches when kernel memory controller is not being used. It assumes that the task allocating, which determines the memcg in the page allocator, belongs to the same cgroup throughout the whole process. Misaccounting can happen if the task calls memcg_kmem_get_cache() while belonging to a cgroup, and later on changes. This is considered acceptable, and should only happen upon task migration. Before the cache is created by the memcg core, there is also a possible imbalance: the task belongs to a memcg, but the cache being allocated from is the global cache, since the child cache is not yet guaranteed to be ready. This case is also fine, since in this case the GFP_KMEMCG will not be passed and the page allocator will not attempt any cgroup accounting. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18memcg: kmem accounting basic infrastructureGlauber Costa1-0/+1
Add the basic infrastructure for the accounting of kernel memory. To control that, the following files are created: * memory.kmem.usage_in_bytes * memory.kmem.limit_in_bytes * memory.kmem.failcnt * memory.kmem.max_usage_in_bytes They have the same meaning of their user memory counterparts. They reflect the state of the "kmem" res_counter. Per cgroup kmem memory accounting is not enabled until a limit is set for the group. Once the limit is set the accounting cannot be disabled for that group. This means that after the patch is applied, no behavioral changes exists for whoever is still using memcg to control their memory usage, until memory.kmem.limit_in_bytes is set for the first time. We always account to both user and kernel resource_counters. This effectively means that an independent kernel limit is in place when the limit is set to a lower value than the user memory. A equal or higher value means that the user limit will always hit first, meaning that kmem is effectively unlimited. People who want to track kernel memory but not limit it, can set this limit to a very high number (like RESOURCE_MAX - 1page - that no one will ever hit, or equal to the user memory) [akpm@linux-foundation.org: MEMCG_MMEM only works with slab and slub] Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: JoonSoo Kim <js1304@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17Merge branch 'for-linus' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "While small this set of changes is very significant with respect to containers in general and user namespaces in particular. The user space interface is now complete. This set of changes adds support for unprivileged users to create user namespaces and as a user namespace root to create other namespaces. The tyranny of supporting suid root preventing unprivileged users from using cool new kernel features is broken. This set of changes completes the work on setns, adding support for the pid, user, mount namespaces. This set of changes includes a bunch of basic pid namespace cleanups/simplifications. Of particular significance is the rework of the pid namespace cleanup so it no longer requires sending out tendrils into all kinds of unexpected cleanup paths for operation. At least one case of broken error handling is fixed by this cleanup. The files under /proc/<pid>/ns/ have been converted from regular files to magic symlinks which prevents incorrect caching by the VFS, ensuring the files always refer to the namespace the process is currently using and ensuring that the ptrace_mayaccess permission checks are always applied. The files under /proc/<pid>/ns/ have been given stable inode numbers so it is now possible to see if different processes share the same namespaces. Through the David Miller's net tree are changes to relax many of the permission checks in the networking stack to allowing the user namespace root to usefully use the networking stack. Similar changes for the mount namespace and the pid namespace are coming through my tree. Two small changes to add user namespace support were commited here adn in David Miller's -net tree so that I could complete the work on the /proc/<pid>/ns/ files in this tree. Work remains to make it safe to build user namespaces and 9p, afs, ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the Kconfig guard remains in place preventing that user namespaces from being built when any of those filesystems are enabled. Future design work remains to allow root users outside of the initial user namespace to mount more than just /proc and /sys." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits) proc: Usable inode numbers for the namespace file descriptors. proc: Fix the namespace inode permission checks. proc: Generalize proc inode allocation userns: Allow unprivilged mounts of proc and sysfs userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file procfs: Print task uids and gids in the userns that opened the proc file userns: Implement unshare of the user namespace userns: Implent proc namespace operations userns: Kill task_user_ns userns: Make create_new_namespaces take a user_ns parameter userns: Allow unprivileged use of setns. userns: Allow unprivileged users to create new namespaces userns: Allow setting a userns mapping to your current uid. userns: Allow chown and setgid preservation userns: Allow unprivileged users to create user namespaces. userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped userns: fix return value on mntns_install() failure vfs: Allow unprivileged manipulation of the mount namespace. vfs: Only support slave subtrees across different user namespaces vfs: Add a user namespace reference from struct mnt_namespace ...
2012-12-16Merge tag 'balancenuma-v11' of ↵Linus Torvalds1-0/+44
git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma Pull Automatic NUMA Balancing bare-bones from Mel Gorman: "There are three implementations for NUMA balancing, this tree (balancenuma), numacore which has been developed in tip/master and autonuma which is in aa.git. In almost all respects balancenuma is the dumbest of the three because its main impact is on the VM side with no attempt to be smart about scheduling. In the interest of getting the ball rolling, it would be desirable to see this much merged for 3.8 with the view to building scheduler smarts on top and adapting the VM where required for 3.9. The most recent set of comparisons available from different people are mel: https://lkml.org/lkml/2012/12/9/108 mingo: https://lkml.org/lkml/2012/12/7/331 tglx: https://lkml.org/lkml/2012/12/10/437 srikar: https://lkml.org/lkml/2012/12/10/397 The results are a mixed bag. In my own tests, balancenuma does reasonably well. It's dumb as rocks and does not regress against mainline. On the other hand, Ingo's tests shows that balancenuma is incapable of converging for this workloads driven by perf which is bad but is potentially explained by the lack of scheduler smarts. Thomas' results show balancenuma improves on mainline but falls far short of numacore or autonuma. Srikar's results indicate we all suffer on a large machine with imbalanced node sizes. My own testing showed that recent numacore results have improved dramatically, particularly in the last week but not universally. We've butted heads heavily on system CPU usage and high levels of migration even when it shows that overall performance is better. There are also cases where it regresses. Of interest is that for specjbb in some configurations it will regress for lower numbers of warehouses and show gains for higher numbers which is not reported by the tool by default and sometimes missed in treports. Recently I reported for numacore that the JVM was crashing with NullPointerExceptions but currently it's unclear what the source of this problem is. Initially I thought it was in how numacore batch handles PTEs but I'm no longer think this is the case. It's possible numacore is just able to trigger it due to higher rates of migration. These reports were quite late in the cycle so I/we would like to start with this tree as it contains much of the code we can agree on and has not changed significantly over the last 2-3 weeks." * tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits) mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable mm/rmap: Convert the struct anon_vma::mutex to an rwsem mm: migrate: Account a transhuge page properly when rate limiting mm: numa: Account for failed allocations and isolations as migration failures mm: numa: Add THP migration for the NUMA working set scanning fault case build fix mm: numa: Add THP migration for the NUMA working set scanning fault case. mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG mm: sched: numa: Control enabling and disabling of NUMA balancing mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships mm: numa: migrate: Set last_nid on newly allocated page mm: numa: split_huge_page: Transfer last_nid on tail page mm: numa: Introduce last_nid to the page frame sched: numa: Slowly increase the scanning period as NUMA faults are handled mm: numa: Rate limit setting of pte_numa if node is saturated mm: numa: Rate limit the amount of memory that is migrated between nodes mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting mm: numa: Migrate pages handled during a pmd_numa hinting fault mm: numa: Migrate on reference policy ...
2012-12-11mm: sched: numa: Control enabling and disabling of NUMA balancingMel Gorman1-0/+8
This patch adds Kconfig options and kernel parameters to allow the enabling and disabling of automatic NUMA balancing. The existance of such a switch was and is very important when debugging problems related to transparent hugepages and we should have the same for automatic NUMA placement. Signed-off-by: Mel Gorman <mgorman@suse.de>
2012-12-11mm: numa: pte_numa() and pmd_numa()Andrea Arcangeli1-0/+37
Implement pte_numa and pmd_numa. We must atomically set the numa bit and clear the present bit to define a pte_numa or pmd_numa. Once a pte or pmd has been set as pte_numa or pmd_numa, the next time a thread touches a virtual address in the corresponding virtual range, a NUMA hinting page fault will trigger. The NUMA hinting page fault will clear the NUMA bit and set the present bit again to resolve the page fault. The expectation is that a NUMA hinting page fault is used as part of a placement policy that decides if a page should remain on the current node or migrated to a different node. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de>
2012-11-30context_tracking: New context tracking susbsystemFrederic Weisbecker1-14/+14
Create a new subsystem that probes on kernel boundaries to keep track of the transitions between level contexts with two basic initial contexts: user or kernel. This is an abstraction of some RCU code that use such tracking to implement its userspace extended quiescent state. We need to pull this up from RCU into this new level of indirection because this tracking is also going to be used to implement an "on demand" generic virtual cputime accounting. A necessary step to shutdown the tick while still accounting the cputime. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Gilad Ben-Yossef <gilad@benyossef.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> [ paulmck: fix whitespace error and email address. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-11-18printk: Wake up klogd using irq_workFrederic Weisbecker1-0/+1
klogd is woken up asynchronously from the tick in order to do it safely. However if printk is called when the tick is stopped, the reader won't be woken up until the next interrupt, which might not fire for a while. As a result, the user may miss some message. To fix this, lets implement the printk tick using a lazy irq work. This subsystem takes care of the timer tick state and can fix up accordingly. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-17irq_work: Remove CONFIG_HAVE_IRQ_WORKFrederic Weisbecker1-4/+0
irq work can run on any arch even without IPI support because of the hook on update_process_times(). So lets remove HAVE_IRQ_WORK because it doesn't reflect any backend requirement. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-16rcu: Add callback-free CPUsPaul E. McKenney1-0/+22
RCU callback execution can add significant OS jitter and also can degrade both scheduling latency and, in asymmetric multiprocessors, energy efficiency. This commit therefore adds the ability for selected CPUs ("rcu_nocbs=" boot parameter) to have their callbacks offloaded to kthreads. If the "rcu_nocb_poll" boot parameter is also specified, these kthreads will do polling, removing the need for the offloaded CPUs to do wakeups. At least one CPU must be doing normal callback processing: currently CPU 0 cannot be selected as a no-CBs CPU. In addition, attempts to offline the last normal-CBs CPU will fail. This feature was inspired by Jim Houston's and Joe Korty's JRCU, and this commit includes fixes to problems located by Fengguang Wu's kbuild test robot. [ paulmck: Added gfp.h include file as suggested by Fengguang Wu. ] Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-11-14userns: Support fuse interacting with multiple user namespacesEric W. Biederman1-1/+0
Use kuid_t and kgid_t in struct fuse_conn and struct fuse_mount_data. The connection between between a fuse filesystem and a fuse daemon is established when a fuse filesystem is mounted and provided with a file descriptor the fuse daemon created by opening /dev/fuse. For now restrict the communication of uids and gids between the fuse filesystem and the fuse daemon to the initial user namespace. Enforce this by verifying the file descriptor passed to the mount of fuse was opened in the initial user namespace. Ensuring the mount happens in the initial user namespace is not necessary as mounts from non-initial user namespaces are not yet allowed. In fuse_req_init_context convert the currrent fsuid and fsgid into the initial user namespace for the request that will be sent to the fuse daemon. In fuse_fill_attr convert the uid and gid passed from the fuse daemon from the initial user namespace into kuids and kgids. In iattr_to_fattr called from fuse_setattr convert kuids and kgids into the uids and gids in the initial user namespace before passing them to the fuse filesystem. In fuse_change_attributes_common called from fuse_dentry_revalidate, fuse_permission, fuse_geattr, and fuse_setattr, and fuse_iget convert the uid and gid from the fuse daemon into a kuid and a kgid to store on the fuse inode. By default fuse mounts are restricted to task whose uid, suid, and euid matches the fuse user_id and whose gid, sgid, and egid matches the fuse group id. Convert the user_id and group_id mount options into kuids and kgids at mount time, and use uid_eq and gid_eq to compare the in fuse_allow_task. Cc: Miklos Szeredi <miklos@szeredi.hu> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-14userns: Support autofs4 interacing with multiple user namespacesEric W. Biederman1-1/+0
Use kuid_t and kgid_t in struct autofs_info and struct autofs_wait_queue. When creating directories and symlinks default the uid and gid of the mount requester to the global root uid and gid. autofs4_wait will update these fields when a mount is requested. When generating autofsv5 packets report the uid and gid of the mount requestor in user namespace of the process that opened the pipe, reporting unmapped uids and gids as overflowuid and overflowgid. In autofs_dev_ioctl_requester return the uid and gid of the last mount requester converted into the calling processes user namespace. When the uid or gid don't map return overflowuid and overflowgid as appropriate, allowing failure to find a mount requester to be distinguished from failure to map a mount requester. The uid and gid mount options specifying the user and group of the root autofs inode are converted into kuid and kgid as they are parsed defaulting to the current uid and current gid of the process that mounts autofs. Mounting of autofs for the present remains confined to processes in the initial user namespace. Cc: Ian Kent <raven@themaw.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-10-24rcu: Wordsmith help text for RCU_USER_QS kernel parameterPaul Gortmaker1-3/+3
This commit adds a "try" missing from the end of the first paragraph of the RCU_USER_QS help text. [ paulmck: Also fix up the last paragraph a bit. ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-23rcu: Update RCU_FAST_NO_HZ help textPaul E. McKenney1-8/+7
The RCU_FAST_NO_HZ help text included a warning about overhead on large systems, but that issue has since been resolved. The main remaining issue with RCU_FAST_NO_HZ is increased real-time latency. This commit therefore updates the help text accordingly. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-14Merge branch 'modules-next' of ↵Linus Torvalds1-0/+68
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module signing support from Rusty Russell: "module signing is the highlight, but it's an all-over David Howells frenzy..." Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG. * 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits) X.509: Fix indefinite length element skip error handling X.509: Convert some printk calls to pr_devel asymmetric keys: fix printk format warning MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking MODSIGN: Make mrproper should remove generated files. MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs MODSIGN: Use the same digest for the autogen key sig as for the module sig MODSIGN: Sign modules during the build process MODSIGN: Provide a script for generating a key ID from an X.509 cert MODSIGN: Implement module signature checking MODSIGN: Provide module signing public keys to the kernel MODSIGN: Automatically generate module signing keys if missing MODSIGN: Provide Kconfig options MODSIGN: Provide gitignore and make clean rules for extra files MODSIGN: Add FIPS policy module: signature checking hook X.509: Add a crypto key parser for binary (DER) X.509 certificates MPILIB: Provide a function to read raw data into an MPI X.509: Add an ASN.1 decoder X.509: Add simple ASN.1 grammar compiler ...
2012-10-12Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds1-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU fixes from Ingo Molnar: "This tree includes a shutdown/cpu-hotplug deadlock fix and a documentation fix." * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Advise most users not to enable RCU user mode rcu: Grace-period initialization excludes only RCU notifier
2012-10-10rcu: Advise most users not to enable RCU user modeFrederic Weisbecker1-0/+12
Discourage distros from enabling CONFIG_RCU_USER_QS because it brings overhead for no benefits yet. It's not a useful feature on its own until we can fully run an adaptive tickless kernel. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-10MODSIGN: Implement module signature checkingDavid Howells1-0/+8
Check the signature on the module against the keys compiled into the kernel or available in a hardware key store. Currently, only RSA keys are supported - though that's easy enough to change, and the signature is expected to contain raw components (so not a PGP or PKCS#7 formatted blob). The signature blob is expected to consist of the following pieces in order: (1) The binary identifier for the key. This is expected to match the SubjectKeyIdentifier from an X.509 certificate. Only X.509 type identifiers are currently supported. (2) The signature data, consisting of a series of MPIs in which each is in the format of a 2-byte BE word sizes followed by the content data. (3) A 12 byte information block of the form: struct module_signature { enum pkey_algo algo : 8; enum pkey_hash_algo hash : 8; enum pkey_id_type id_type : 8; u8 __pad; __be32 id_length; __be32 sig_length; }; The three enums are defined in crypto/public_key.h. 'algo' contains the public-key algorithm identifier (0->DSA, 1->RSA). 'hash' contains the digest algorithm identifier (0->MD4, 1->MD5, 2->SHA1, etc.). 'id_type' contains the public-key identifier type (0->PGP, 1->X.509). '__pad' should be 0. 'id_length' should contain in the binary identifier length in BE form. 'sig_length' should contain in the signature data length in BE form. The lengths are in BE order rather than CPU order to make dealing with cross-compilation easier. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor Kconfig fix)
2012-10-10MODSIGN: Provide Kconfig optionsDavid Howells1-0/+38
Provide kernel configuration options for module signing. The following configuration options are added: CONFIG_MODULE_SIG_SHA1 CONFIG_MODULE_SIG_SHA224 CONFIG_MODULE_SIG_SHA256 CONFIG_MODULE_SIG_SHA384 CONFIG_MODULE_SIG_SHA512 These select the cryptographic hash used to digest the data prior to signing. Additionally, the crypto module selected will be built into the kernel as it won't be possible to load it as a module without incurring a circular dependency when the kernel tries to check its signature. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>