linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2014-10-10	Merge tag 'sound-3.18-rc1' of ↵	Linus Torvalds	195	-3263/+8018
	git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "This time it's a relatively calm update batch, but the amount isn't too small in the end. Here we go over some highlights: ALSA core: - One major change is the support of nonatomic PCM operations. This allows the trigger and other callbacks to call schedule(), which would be useful for mailbox type communications. Already some drivers (Digigram ones) have been converted to use together with threaded irqs as an example. - Improvement / fixes of DSD PCM format support HD-audio: - Large volume of rewrites are found in Realtek codec driver for converting Dell and HP quirks to generic forms. - Inverted dmic code cleanup from David. - Realtek COEF access has been optimized. - Now HD-audio jack infrastructure allows multiple callbacks, which fixes / simplifies the jack-dependent power controls on STAC/IDT and VIA codecs. - Many additional device-specific fixups as usual - A few deadcode cleanups, CA0132 code cleanup, etc. ASoC: - More componentization work from Lars-Peter, this time mainly cleaning up the suspend and bias level transition callbacks. - Real system support for the Intel drivers and a bunch of fixes and enhancements for the associated CODEC drivers, this is going to need a lot quirks over time due to the lack of any firmware description of the boards. - Jack detect support for simple card from Dylan Reid. - A bunch of small fixes and enhancements for the Freescale drivers. - New drivers for Analog Devices SSM4567, Cirrus Logic CS35L32, Everest Semiconductor ES8328 and Freescale cards using the ASRC in newer i.MX processors. - A few simple-card fixes, mostly cleanups but also a fix for interaction between GPIO 0 and simple-card. Misc: - Virtuoso / Oxygen updates by Clemens - USB-audio: Yamaha MOTIF XF MIDI port name fixes - Conversion of kernel messages to standard dev_() in ctxfi driver" tag 'sound-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (251 commits) ASoC: mc13783: Ensure we only try to dereference valid of_nodes ASoC: rockchip-i2s: fix infinite loop in rockchip_snd_txctrl ALSA: hda - Add dock port support to Thinkpad L440 (71aa:501e) ALSA: Allow pass NULL dev for snd_pci_quirk_lookup() ASoC: imx-es8328: Fix of_node_put() call with uninitialized object ASoC: soc-pcm: fix sig_bits determination in soc_pcm_apply_msb() ASoC: simple-card: Initialize headphone and mic GPIO numbers ASoC: imx-es8328: Fix missing return code in imx_es8328_probe() ALSA: hda - Add dock support for Thinkpad T440 (17aa:2212) ALSA: usb: caiaq: check for cdev->n_streams > 1 ASoC: 88pm860x-codec: Fix possibly missing string termination ASoC: core: fix use after free in snd_soc_remove_platform() ASoC: soc-dapm: fix use after free ALSA: hda - Make the inv dmic handling for Realtek use generic parser ALSA: hda - Add Inverted Internal mic for Samsung Ativ book 9 (NP900X3G) ALSA: hda - Add inverted internal mic for Asus Aspire 4830T ASoC: Intel: byt-rt5640: fix coccinelle warnings ASoC: fsl_esai doc: Add "fsl,vf610-esai" as compatible string ASoC: da732x: Remove unnecessary KERN_ERR in pr_err() ASoC: simple-card: Fix detect gpio documentation. ...
2014-10-10	Merge tag 'edac/v3.18-rc1' of ↵	Linus Torvalds	2	-33/+22
	git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull edac updates from Mauro Carvalho Chehab: "Nothing really exiting here: just one bug fix at sb_edac, and some changes to allow other drivers to use some shared PCI addresses" * tag 'edac/v3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: sb_edac: Claim a different PCI device Move Intel SNB device ids from sb_edac to pci_ids.h sb_edac: avoid INTERNAL ERROR message in EDAC with unspecified channel
2014-10-10	Merge tag 'media/v3.18-rc1' of ↵	Linus Torvalds	474	-13050/+36213
	git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - new IR driver: hix5hd2-ir - the virtual test driver (vivi) was replaced by vivid, with has an almost complete set of features to emulate most v4l2 devices and properly test all sorts of userspace apps - the as102 driver had several bugs fixed and was properly split into a frontend and a core driver. With that, it got promoted from staging into mainstream - one new CI driver got added for CIMaX SP2/SP2HF (sp2 driver) - one new frontend driver for Toshiba ISDB-T/ISDB-S demod (tc90522) - one new PCI driver for ISDB-T/ISDB-S (pt3 driver) - saa7134 driver got support for go7007-based devices - added a new PCI driver for Techwell 68xx chipsets (tw68) - a new platform driver was added (coda) - new tuner drivers: mxl301rf and qm1d1c0042 - a new DVB USB driver was added for DVBSky S860 & similar devices - added a new SDR driver (hackrf) - usbtv got audio support - several platform drivers are now compiled with COMPILE_TEST - a series of compiler fixup patches, making sparse/spatch happier with the media stuff and removing several warnings, especially on those platform drivers that didn't use to compile on x86 - Support for several new modern devices got added - lots of other fixes, improvements and cleanups * tag 'media/v3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (544 commits) [media] ir-hix5hd2: fix build on c6x arch [media] pt3: fix DTV FE I2C driver load error paths Revert "[media] media: em28xx - remove reset_resume interface" [media] exynos4-is: fix some warnings when compiling on arm64 [media] usb drivers: use %zu instead of %zd [media] pci drivers: use %zu instead of %zd [media] dvb-frontends: use %zu instead of %zd [media] s5p-mfc: Fix several printk warnings [media] s5p_mfc_opr: Fix warnings [media] ti-vpe: Fix typecast [media] s3c-camif: fix dma_addr_t printks [media] s5p_mfc_opr_v6: get rid of warnings when compiled with 64 bits [media] s5p_mfc_opr_v5: Fix lots of warnings on x86_64 [media] em28xx: Fix identation [media] drxd: remove a dead code [media] saa7146: remove return after BUG() [media] cx88: remove return after BUG() [media] cx88: fix cards table CodingStyle [media] radio-sf16fmr2: declare some structs as static [media] radio-sf16fmi: declare pnp_attached as static ...
2014-10-10	Merge branch 'for-v3.18' of ↵	Linus Torvalds	5	-38/+48
	git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull dma-mapping update from Marek Szyprowski: "Provide the dma write coherent api (available previously on ARM architecture) for all other architectures, which use dma_ops-based dma mapping implementation. This lets one to use the same code in the device drivers regardless of the selected architecture" * 'for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: dma-mapping: Provide write-combine allocations s390: Implement dma_{alloc,free}_attrs()
2014-10-10	Merge tag 'hwmon-for-linus-v3.18' of ↵	Linus Torvalds	23	-228/+949
	git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon updates from Guenter Roeck: - new driver for menf21bmc. - convert k10temp, smsc47b397, da9052, da9055 to new hwmon API. - register ntc_thermistor driver with thermal subsystem. - add support for F15h M60h to k10temp driver. - add driver for MEN14F021P00 BMC HWMON driver; this required a merge with tag mfd-hwmon-leds-watchdog-v3.18 * tag 'hwmon-for-linus-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (ab8500) Call kernel_power_off instead of pm_power_off hwmon: (menf21bmc) Introduce MEN14F021P00 BMC HWMON driver leds: leds-menf21bmc: Introduce MEN 14F021P00 BMC LED driver watchdog: menf21bmc_wdt: Introduce MEN 14F021P00 BMC Watchdog driver mfd: menf21bmc: Introduce MEN 14F021P00 BMC MFD Core driver hwmon: (ntc_thermistor) Add ntc thermistor to thermal subsystem as a sensor. hwmon: (smsc47b397) Convert to devm_hwmon_device_register_with_groups MAINTAINERS: add entry for the PWM fan driver hwmon: (k10temp) Convert to devm_hwmon_device_register_with_groups hwmon: (k10temp) Add support for F15h M60h hwmon: (da9052) Convert to devm_hwmon_device_register_with_groups hwmon: (da9055) Convert to devm_hwmon_device_register_with_groups hwmon: (ads1015) Use of_property_read_u32 at appropriate places
2014-10-10	Merge tag 'restart-handler-for-v3.18' of ↵	Linus Torvalds	8	-42/+165
	git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull restart handler infrastructure from Guenter Roeck: "This series was supposed to be pulled through various trees using it, and I did not plan to send a separate pull request. As it turns out, the pinctrl tree did not merge with it, is now upstream, and uses it, meaning there are now build failures. Please pull this series directly to fix those build failures" * tag 'restart-handler-for-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: arm/arm64: unexport restart handlers watchdog: sunxi: register restart handler with kernel restart handler watchdog: alim7101: register restart handler with kernel restart handler watchdog: moxart: register restart handler with kernel restart handler arm: support restart through restart handler call chain arm64: support restart through restart handler call chain power/restart: call machine_restart instead of arm_pm_restart kernel: add support for kernel restart handler call chain
2014-10-10	Merge branch 'for-3.18' of ↵	Linus Torvalds	39	-444/+879
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: "A lot of activities on percpu front. Notable changes are... - percpu allocator now can take @gfp. If @gfp doesn't contain GFP_KERNEL, it tries to allocate from what's already available to the allocator and a work item tries to keep the reserve around certain level so that these atomic allocations usually succeed. This will replace the ad-hoc percpu memory pool used by blk-throttle and also be used by the planned blkcg support for writeback IOs. Please note that I noticed a bug in how @gfp is interpreted while preparing this pull request and applied the fix 6ae833c7fe0c ("percpu: fix how @gfp is interpreted by the percpu allocator") just now. - percpu_ref now uses longs for percpu and global counters instead of ints. It leads to more sparse packing of the percpu counters on 64bit machines but the overhead should be negligible and this allows using percpu_ref for refcnting pages and in-memory objects directly. - The switching between percpu and single counter modes of a percpu_ref is made independent of putting the base ref and a percpu_ref can now optionally be initialized in single or killed mode. This allows avoiding percpu shutdown latency for cases where the refcounted objects may be synchronously created and destroyed in rapid succession with only a fraction of them reaching fully operational status (SCSI probing does this when combined with blk-mq support). It's also planned to be used to implement forced single mode to detect underflow more timely for debugging. There's a separate branch percpu/for-3.18-consistent-ops which cleans up the duplicate percpu accessors. That branch causes a number of conflicts with s390 and other trees. I'll send a separate pull request w/ resolutions once other branches are merged" * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits) percpu: fix how @gfp is interpreted by the percpu allocator blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky percpu_ref: add PERCPU_REF_INIT_* flags percpu_ref: decouple switching to percpu mode and reinit percpu_ref: decouple switching to atomic mode and killing percpu_ref: add PCPU_REF_DEAD percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch percpu_ref: replace pcpu_ prefix with percpu_ percpu_ref: minor code and comment updates percpu_ref: relocate percpu_ref_reinit() Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe" Revert "percpu: free percpu allocation info for uniprocessor system" percpu-refcount: make percpu_ref based on longs instead of ints percpu-refcount: improve WARN messages percpu: fix locking regression in the failure path of pcpu_alloc() percpu-refcount: add @gfp to percpu_ref_init() proportions: add @gfp to init functions percpu_counter: add @gfp to percpu_counter_init() percpu_counter: make percpu_counters_lock irq-safe ...
2014-10-10	Merge branch 'for-3.18' of ↵	Linus Torvalds	7	-205/+71
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Nothing too interesting. Just a handful of cleanup patches" * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: Revert "cgroup: remove redundant variable in cgroup_mount()" cgroup: remove redundant variable in cgroup_mount() cgroup: fix missing unlock in cgroup_release_agent() cgroup: remove CGRP_RELEASABLE flag perf/cgroup: Remove perf_put_cgroup() cgroup: remove redundant check in cgroup_ino() cpuset: simplify proc_cpuset_show() cgroup: simplify proc_cgroup_show() cgroup: use a per-cgroup work for release agent cgroup: remove bogus comments cgroup: remove redundant code in cgroup_rmdir() cgroup: remove some useless forward declarations cgroup: fix a typo in comment.
2014-10-10	Merge branch 'for-3.18' of ↵	Linus Torvalds	23	-298/+259
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata update from Tejun Heo: "AHCI is getting per-port irq handling and locks for better scalability. The gain is not huge but measureable with multiple high iops devices connected to the same host; however, the value of threaded IRQ handling seems negligible for AHCI and it likely will revert to non-threaded handling soon. Another noteworthy change is George Spelvin's "libata: Un-break ATA blacklist". During 3.17 devel cycle, the libata blacklist glob matching got generalized and rewritten; unfortunately, the patch forgot to swap arguments to match the new match function and ended up breaking blacklist matching completely. It got noticed only a couple days ago so it couldn't make for-3.17-fixes either. :( Other than the above two, nothing too interesting - the usual cleanup churns and device-specific changes" * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (22 commits) pata_serverworks: disable 64-KB DMA transfers on Broadcom OSB4 IDE Controller libata: Un-break ATA blacklist AHCI: Do not acquire ata_host::lock from single IRQ handler AHCI: Optimize single IRQ interrupt processing AHCI: Do not read HOST_IRQ_STAT reg in multi-MSI mode AHCI: Make few function names more descriptive AHCI: Move host activation code into ahci_host_activate() AHCI: Move ahci_host_activate() function to libahci.c AHCI: Pass SCSI host template as arg to ahci_host_activate() ata: pata_imx: Use the SIMPLE_DEV_PM_OPS() macro AHCI: Cleanup checking of multiple MSIs/SLM modes libata-sff: Fix controllers with no ctl port ahci_xgene: Fix the error print invalid resource for APM X-Gene SoC AHCI SATA Host Controller driver. libata: change ata_<foo>_printk routines to return void ata: qcom: Add device tree bindings information ahci-platform: Bump max number of clocks to 5 ahci: ahci_p5wdh_workaround - constify DMI table libahci_platform: Staticize ahci_platform_<en/dis>able_phys() pata_platform: Remove useless irq_flags field pata_of_platform: Remove "electra-ide" quirk ...
2014-10-09	Merge branch 'akpm' (fixes from Andrew Morton)	Linus Torvalds	177	-2866/+4103
	Merge patch-bomb from Andrew Morton: - part of OCFS2 (review is laggy again) - procfs - slab - all of MM - zram, zbud - various other random things: arch, filesystems. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (164 commits) nosave: consolidate __nosave_{begin,end} in <asm/sections.h> include/linux/screen_info.h: remove unused ORIG_* macros kernel/sys.c: compat sysinfo syscall: fix undefined behavior kernel/sys.c: whitespace fixes acct: eliminate compile warning kernel/async.c: switch to pr_foo() include/linux/blkdev.h: use NULL instead of zero include/linux/kernel.h: deduplicate code implementing clamp* macros include/linux/kernel.h: rewrite min3, max3 and clamp using min and max alpha: use Kbuild logic to include <asm-generic/sections.h> frv: remove deprecated IRQF_DISABLED frv: remove unused cpuinfo_frv and friends to fix future build error zbud: avoid accessing last unused freelist zsmalloc: simplify init_zspage free obj linking mm/zsmalloc.c: correct comment for fullness group computation zram: use notify_free to account all free notifications zram: report maximum used memory zram: zram memory size limitation zsmalloc: change return value unit of zs_get_total_size_bytes zsmalloc: move pages_allocated to zs_pool ...
2014-10-09	nosave: consolidate __nosave_{begin,end} in <asm/sections.h>	Geert Uytterhoeven	12	-31/+12
	The different architectures used their own (and different) declarations: extern __visible const void __nosave_begin, __nosave_end; extern const void __nosave_begin, __nosave_end; extern long __nosave_begin, __nosave_end; Consolidate them using the first variant in <asm/sections.h>. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	include/linux/screen_info.h: remove unused ORIG_* macros	Geert Uytterhoeven	1	-8/+0
	The ORIG_* macros definitions to access struct screen_info members and all of their users were removed 7 years ago by commit 3ea335100014785f ("Remove magic macros for screen_info structure members"), but (only) the definitions reappeared a few days later in commit ee8e7cfe9d330d6f ("Make asm-x86/bootparam.h includable from userspace."). Remove them for good. Amen. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	kernel/sys.c: compat sysinfo syscall: fix undefined behavior	Scotty Bauer	1	-1/+1
	Fix undefined behavior and compiler warning by replacing right shift 32 with upper_32_bits macro Signed-off-by: Scotty Bauer <sbauer@eng.utah.edu> Cc: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	kernel/sys.c: whitespace fixes	vishnu.ps	1	-128/+137
	Fix minor errors and warning messages in kernel/sys.c. These errors were reported by checkpatch while working with some modifications in sys.c file. Fixing this first will help me to improve my further patches. ERROR: trailing whitespace - 9 ERROR: do not use assignment in if condition - 4 ERROR: spaces required around that '?' (ctx:VxO) - 10 ERROR: switch and case should be at the same indent - 3 total 26 errors & 3 warnings fixed. Signed-off-by: vishnu.ps <vishnu.ps@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	acct: eliminate compile warning	Ying Xue	1	-5/+9
	If ACCT_VERSION is not defined to 3, below warning appears: CC kernel/acct.o kernel/acct.c: In function `do_acct_process': kernel/acct.c:475:24: warning: unused variable `ns' [-Wunused-variable] [akpm@linux-foundation.org: retain the local for code size improvements Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	kernel/async.c: switch to pr_foo()	Ionut Alexa	1	-4/+4
	Signed-off-by: Ionut Alexa <ionut.m.alexa@gmail.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	include/linux/blkdev.h: use NULL instead of zero	Michele Curti	1	-1/+1
	Quite useless but it shuts up some warnings. Signed-off-by: Michele Curti <michele.curti@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	include/linux/kernel.h: deduplicate code implementing clamp* macros	Michal Nazarewicz	1	-17/+7
	Instead of open-coding clamp_t macro min_t and max_t the way clamp macro does and instead of open-coding clamp_val simply use clamp_t. Furthermore, normalise argument naming in the macros to be lo and hi. Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Cc: Mark Rustad <mark.d.rustad@intel.com> Cc: "Kirsher, Jeffrey T" <jeffrey.t.kirsher@intel.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	include/linux/kernel.h: rewrite min3, max3 and clamp using min and max	Michal Nazarewicz	1	-27/+5
	It appears that gcc is better at optimising a double call to min and max rather than open coded min3 and max3. This can be observed here: $ cat min-max.c #define min(x, y) ({ \ typeof(x) _min1 = (x); \ typeof(y) _min2 = (y); \ (void) (&_min1 == &_min2); \ _min1 < _min2 ? _min1 : _min2; }) #define min3(x, y, z) ({ \ typeof(x) _min1 = (x); \ typeof(y) _min2 = (y); \ typeof(z) _min3 = (z); \ (void) (&_min1 == &_min2); \ (void) (&_min1 == &_min3); \ _min1 < _min2 ? (_min1 < _min3 ? _min1 : _min3) : \ (_min2 < _min3 ? _min2 : _min3); }) int fmin3(int x, int y, int z) { return min3(x, y, z); } int fmin2(int x, int y, int z) { return min(min(x, y), z); } $ gcc -O2 -o min-max.s -S min-max.c; cat min-max.s .file "min-max.c" .text .p2align 4,,15 .globl fmin3 .type fmin3, @function fmin3: .LFB0: .cfi_startproc cmpl %esi, %edi jl .L5 cmpl %esi, %edx movl %esi, %eax cmovle %edx, %eax ret .p2align 4,,10 .p2align 3 .L5: cmpl %edi, %edx movl %edi, %eax cmovle %edx, %eax ret .cfi_endproc .LFE0: .size fmin3, .-fmin3 .p2align 4,,15 .globl fmin2 .type fmin2, @function fmin2: .LFB1: .cfi_startproc cmpl %edi, %esi movl %edx, %eax cmovle %esi, %edi cmpl %edx, %edi cmovle %edi, %eax ret .cfi_endproc .LFE1: .size fmin2, .-fmin2 .ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3" .section .note.GNU-stack,"",@progbits fmin3 function, which uses open-coded min3 macro, is compiled into total of ten instructions including a conditional branch, whereas fmin2 function, which uses two calls to min2 macro, is compiled into six instructions with no branches. Similarly, open-coded clamp produces the same code as clamp using min and max macros, but the latter is much shorter: $ cat clamp.c #define clamp(val, min, max) ({ \ typeof(val) __val = (val); \ typeof(min) __min = (min); \ typeof(max) __max = (max); \ (void) (&__val == &__min); \ (void) (&__val == &__max); \ __val = __val < __min ? __min: __val; \ __val > __max ? __max: __val; }) #define min(x, y) ({ \ typeof(x) _min1 = (x); \ typeof(y) _min2 = (y); \ (void) (&_min1 == &_min2); \ _min1 < _min2 ? _min1 : _min2; }) #define max(x, y) ({ \ typeof(x) _max1 = (x); \ typeof(y) _max2 = (y); \ (void) (&_max1 == &_max2); \ _max1 > _max2 ? _max1 : _max2; }) int fclamp(int v, int min, int max) { return clamp(v, min, max); } int fclampmm(int v, int min, int max) { return min(max(v, min), max); } $ gcc -O2 -o clamp.s -S clamp.c; cat clamp.s .file "clamp.c" .text .p2align 4,,15 .globl fclamp .type fclamp, @function fclamp: .LFB0: .cfi_startproc cmpl %edi, %esi movl %edx, %eax cmovge %esi, %edi cmpl %edx, %edi cmovle %edi, %eax ret .cfi_endproc .LFE0: .size fclamp, .-fclamp .p2align 4,,15 .globl fclampmm .type fclampmm, @function fclampmm: .LFB1: .cfi_startproc cmpl %edi, %esi cmovge %esi, %edi cmpl %edi, %edx movl %edi, %eax cmovle %edx, %eax ret .cfi_endproc .LFE1: .size fclampmm, .-fclampmm .ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3" .section .note.GNU-stack,"",@progbits Linux mpn-glaptop 3.13.0-29-generic #53~precise1-Ubuntu SMP Wed Jun 4 22:06:25 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 Copyright (C) 2011 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. -rwx------ 1 mpn eng 51224656 Jun 17 14:15 vmlinux.before -rwx------ 1 mpn eng 51224608 Jun 17 13:57 vmlinux.after 48 bytes reduction. The do_fault_around was a few instruction shorter and as far as I can tell saved 12 bytes on the stack, i.e.: $ grep -e rsp -e pop -e push do_fault_around.* do_fault_around.before.s:push %rbp do_fault_around.before.s:mov %rsp,%rbp do_fault_around.before.s:push %r13 do_fault_around.before.s:push %r12 do_fault_around.before.s:push %rbx do_fault_around.before.s:sub $0x38,%rsp do_fault_around.before.s:add $0x38,%rsp do_fault_around.before.s:pop %rbx do_fault_around.before.s:pop %r12 do_fault_around.before.s:pop %r13 do_fault_around.before.s:pop %rbp do_fault_around.after.s:push %rbp do_fault_around.after.s:mov %rsp,%rbp do_fault_around.after.s:push %r12 do_fault_around.after.s:push %rbx do_fault_around.after.s:sub $0x30,%rsp do_fault_around.after.s:add $0x30,%rsp do_fault_around.after.s:pop %rbx do_fault_around.after.s:pop %r12 do_fault_around.after.s:pop %rbp or here side-by-side: Before After push %rbp push %rbp mov %rsp,%rbp mov %rsp,%rbp push %r13 push %r12 push %r12 push %rbx push %rbx sub $0x38,%rsp sub $0x30,%rsp add $0x38,%rsp add $0x30,%rsp pop %rbx pop %rbx pop %r12 pop %r12 pop %r13 pop %rbp pop %rbp There are also fewer branches: $ grep ^j do_fault_around.* do_fault_around.before.s:jae ffffffff812079b7 do_fault_around.before.s:jmp ffffffff812079c5 do_fault_around.before.s:jmp ffffffff81207a14 do_fault_around.before.s:ja ffffffff812079f9 do_fault_around.before.s:jb ffffffff81207a10 do_fault_around.before.s:jmp ffffffff81207a63 do_fault_around.before.s:jne ffffffff812079df do_fault_around.after.s:jmp ffffffff812079fd do_fault_around.after.s:ja ffffffff812079e2 do_fault_around.after.s:jb ffffffff812079f9 do_fault_around.after.s:jmp ffffffff81207a4c do_fault_around.after.s:jne ffffffff812079c8 And here's with allyesconfig on a different machine: $ uname -a; gcc --version; ls -l vmlinux.* Linux erwin 3.14.7-mn #54 SMP Sun Jun 15 11:25:08 CEST 2014 x86_64 AMD Phenom(tm) II X3 710 Processor AuthenticAMD GNU/Linux gcc (GCC) 4.8.3 Copyright (C) 2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. -rwx------ 1 mpn eng 437027411 Jun 20 16:04 vmlinux.before -rwx------ 1 mpn eng 437026881 Jun 20 15:30 vmlinux.after 530 bytes reduction. Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: David Rientjes <rientjes@google.com> Cc: "Rustad, Mark D" <mark.d.rustad@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	alpha: use Kbuild logic to include <asm-generic/sections.h>	Geert Uytterhoeven	2	-7/+1
	Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	frv: remove deprecated IRQF_DISABLED	Michael Opdenacker	4	-8/+6
	Remove the IRQF_DISABLED flag from FRV architecture code. It's a NOOP since 2.6.35 and it will be removed one day. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	frv: remove unused cpuinfo_frv and friends to fix future build error	Geert Uytterhoeven	2	-18/+0
	Frv has a macro named cpu_data, interfering with variables and struct members with the same name: include/linux/pm_domain.h:75:24: error: expected identifier or '(' before '&' token struct gpd_cpu_data *cpu_data; As struct cpuinfo_frv, boot_cpu_data, cpu_data, and current_cpu_data are not used, removed them to fix this. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reported-by: kbuild test robot <fengguang.wu@intel.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	zbud: avoid accessing last unused freelist	Chao Yu	1	-6/+7
	For now, there are NCHUNKS of 64 freelists in zbud_pool, the last unbuddied[63] freelist linked with all zbud pages which have free chunks of 63. Calculating according to context of num_free_chunks(), our max chunk number of unbuddied zbud page is 62, so none of zbud pages will be added/removed in last freelist, but still we will try to find an unbuddied zbud page in the last unused freelist, it is unneeded. This patch redefines NCHUNKS to 63 as free chunk number in one zbud page, hence we can decrease size of zpool and avoid accessing the last unused freelist whenever failing to allocate zbud from freelist in zbud_alloc. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Cc: Seth Jennings <sjennings@variantweb.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	zsmalloc: simplify init_zspage free obj linking	Dan Streetman	1	-9/+5
	Change zsmalloc init_zspage() logic to iterate through each object on each of its pages, checking the offset to verify the object is on the current page before linking it into the zspage. The current zsmalloc init_zspage free object linking code has logic that relies on there only being one page per zspage when PAGE_SIZE is a multiple of class->size. It calculates the number of objects for the current page, and iterates through all of them plus one, to account for the assumed partial object at the end of the page. While this currently works, the logic can be simplified to just link the object at each successive offset until the offset is larger than PAGE_SIZE, which does not rely on PAGE_SIZE being a multiple of class->size. Signed-off-by: Dan Streetman <ddstreet@ieee.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjennings@variantweb.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	mm/zsmalloc.c: correct comment for fullness group computation	Wang Sheng-Hui	1	-1/+1
	The letter 'f' in "n <= N/f" stands for fullness_threshold_frac, not 1/fullness_threshold_frac. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	zram: use notify_free to account all free notifications	Sergey Senozhatsky	2	-5/+9
	`notify_free' device attribute accounts the number of slot free notifications and internally represents the number of zram_free_page() calls. Slot free notifications are sent only when device is used as a swap device, hence `notify_free' is used only for swap devices. Since f4659d8e620d08 (zram: support REQ_DISCARD) ZRAM handles yet another one free notification (also via zram_free_page() call) -- REQ_DISCARD requests, which are sent by a filesystem, whenever some data blocks are discarded. However, there is no way to know the number of notifications in the latter case. Use `notify_free' to account the number of pages freed by zram_bio_discard() and zram_slot_free_notify(). Depending on usage scenario `notify_free' represents: a) the number of pages freed because of slot free notifications, which is equal to the number of swap_slot_free_notify() calls, so there is no behaviour change b) the number of pages freed because of REQ_DISCARD notifications Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	zram: report maximum used memory	Minchan Kim	4	-2/+70
	Normally, zram user could get maximum memory usage zram consumed via polling mem_used_total with sysfs in userspace. But it has a critical problem because user can miss peak memory usage during update inverval of polling. For avoiding that, user should poll it with shorter interval(ie, 0.0000000001s) with mlocking to avoid page fault delay when memory pressure is heavy. It would be troublesome. This patch adds new knob "mem_used_max" so user could see the maximum memory usage easily via reading the knob and reset it via "echo 0 > /sys/block/zram0/mem_used_max". Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Dan Streetman <ddstreet@ieee.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: <juno.choi@lge.com> Cc: <seungho1.park@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjennings@variantweb.net> Reviewed-by: David Horner <ds2horner@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	zram: zram memory size limitation	Minchan Kim	4	-4/+79
	Since zram has no control feature to limit memory usage, it makes hard to manage system memrory. This patch adds new knob "mem_limit" via sysfs to set up the a limit so that zram could fail allocation once it reaches the limit. In addition, user could change the limit in runtime so that he could manage the memory more dynamically. Initial state is no limit so it doesn't break old behavior. [akpm@linux-foundation.org: fix typo, per Sergey] Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: <juno.choi@lge.com> Cc: <seungho1.park@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: David Horner <ds2horner@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	zsmalloc: change return value unit of zs_get_total_size_bytes	Minchan Kim	3	-8/+7
	zs_get_total_size_bytes returns a amount of memory zsmalloc consumed with byte unit but zsmalloc operates page unit rather than byte unit so let's change the API so benefit we could get is that reduce unnecessary overhead (ie, change page unit with byte unit) in zsmalloc. Since return type is pages, "zs_get_total_pages" is better than "zs_get_total_size_bytes". Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Dan Streetman <ddstreet@ieee.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: <juno.choi@lge.com> Cc: <seungho1.park@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: David Horner <ds2horner@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	zsmalloc: move pages_allocated to zs_pool	Minchan Kim	1	-15/+8
	Currently, zram has no feature to limit memory so theoretically zram can deplete system memory. Users have asked for a limit several times as even without exhaustion zram makes it hard to control memory usage of the platform. This patchset adds the feature. Patch 1 makes zs_get_total_size_bytes faster because it would be used frequently in later patches for the new feature. Patch 2 changes zs_get_total_size_bytes's return unit from bytes to page so that zsmalloc doesn't need unnecessary operation(ie, << PAGE_SHIFT). Patch 3 adds new feature. I added the feature into zram layer, not zsmalloc because limiation is zram's requirement, not zsmalloc so any other user using zsmalloc(ie, zpool) shouldn't affected by unnecessary branch of zsmalloc. In future, if every users of zsmalloc want the feature, then, we could move the feature from client side to zsmalloc easily but vice versa would be painful. Patch 4 adds news facility to report maximum memory usage of zram so that this avoids user polling frequently via /sys/block/zram0/ mem_used_total and ensures transient max are not missed. This patch (of 4): pages_allocated has counted in size_class structure and when user of zsmalloc want to see total_size_bytes, it should gather all of count from each size_class to report the sum. It's not bad if user don't see the value often but if user start to see the value frequently, it would be not a good deal for performance pov. This patch moves the count from size_class to zs_pool so it could reduce memory footprint (from [255 * 8byte] to [sizeof(atomic_long_t)]). Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Dan Streetman <ddstreet@ieee.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: <juno.choi@lge.com> Cc: <seungho1.park@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjennings@variantweb.net> Reviewed-by: David Horner <ds2horner@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	m68k: call find_vma with the mmap_sem held in sys_cacheflush()	Davidlohr Bueso	1	-8/+13
	Performing vma lookups without taking the mm->mmap_sem is asking for trouble. While doing the search, the vma in question can be modified or even removed before returning to the caller. Take the lock (shared) in order to avoid races while iterating through the vmacache and/or rbtree. In addition, this guarantees that the address space will remain intact during the CPU flushing. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	vmstat: on-demand vmstat workers V8	Christoph Lameter	1	-21/+120
	vmstat workers are used for folding counter differentials into the zone, per node and global counters at certain time intervals. They currently run at defined intervals on all processors which will cause some holdoff for processors that need minimal intrusion by the OS. The current vmstat_update mechanism depends on a deferrable timer firing every other second by default which registers a work queue item that runs on the local CPU, with the result that we have 1 interrupt and one additional schedulable task on each CPU every 2 seconds If a workload indeed causes VM activity or multiple tasks are running on a CPU, then there are probably bigger issues to deal with. However, some workloads dedicate a CPU for a single CPU bound task. This is done in high performance computing, in high frequency financial applications, in networking (Intel DPDK, EZchip NPS) and with the advent of systems with more and more CPUs over time, this may become more and more common to do since when one has enough CPUs one cares less about efficiently sharing a CPU with other tasks and more about efficiently monopolizing a CPU per task. The difference of having this timer firing and workqueue kernel thread scheduled per second can be enormous. An artificial test measuring the worst case time to do a simple "i++" in an endless loop on a bare metal system and under Linux on an isolated CPU with dynticks and with and without this patch, have Linux match the bare metal performance (~700 cycles) with this patch and loose by couple of orders of magnitude (~200k cycles) without it[*]. The loss occurs for something that just calculates statistics. For networking applications, for example, this could be the difference between dropping packets or sustaining line rate. Statistics are important and useful, but it would be great if there would be a way to not cause statistics gathering produce a huge performance difference. This patche does just that. This patch creates a vmstat shepherd worker that monitors the per cpu differentials on all processors. If there are differentials on a processor then a vmstat worker local to the processors with the differentials is created. That worker will then start folding the diffs in regular intervals. Should the worker find that there is no work to be done then it will make the shepherd worker monitor the differentials again. With this patch it is possible then to have periods longer than 2 seconds without any OS event on a "cpu" (hardware thread). The patch shows a very minor increased in system performance. hackbench -s 512 -l 2000 -g 15 -f 25 -P Results before the patch: Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks) Each sender will pass 2000 messages of 512 bytes Time: 4.992 Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks) Each sender will pass 2000 messages of 512 bytes Time: 4.971 Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks) Each sender will pass 2000 messages of 512 bytes Time: 5.063 Hackbench after the patch: Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks) Each sender will pass 2000 messages of 512 bytes Time: 4.973 Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks) Each sender will pass 2000 messages of 512 bytes Time: 4.990 Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks) Each sender will pass 2000 messages of 512 bytes Time: 4.993 [fengguang.wu@intel.com: cpu_stat_off can be static] Signed-off-by: Christoph Lameter <cl@linux.com> Reviewed-by: Gilad Ben-Yossef <gilad@benyossef.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Max Krasnyansky <maxk@qti.qualcomm.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	CMA: document cma=0	Jean Delvare	2	-1/+5
	It isn't obvious that CMA can be disabled on the kernel's command line, so document it. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	fs/buffer.c: increase the buffer-head per-CPU LRU size	Sebastien Buisson	1	-1/+1
	Increase the buffer-head per-CPU LRU size to allow efficient filesystem operations that access many blocks for each transaction. For example, creating a file in a large ext4 directory with quota enabled will access multiple buffer heads and will overflow the LRU at the default 8-block LRU size: * parent directory inode table block (ctime, nlinks for subdirs) * new inode bitmap * inode table block * 2 quota blocks * directory leaf block (not reused, but pollutes one cache entry) * 2 levels htree blocks (only one is reused, other pollutes cache) * 2 levels indirect/index blocks (only one is reused) The buffer-head per-CPU LRU size is raised to 16, as it shows in metadata performance benchmarks up to 10% gain for create, 4% for lookup and 7% for destroy. Signed-off-by: Liang Zhen <liang.zhen@intel.com> Signed-off-by: Andreas Dilger <andreas.dilger@intel.com> Signed-off-by: Sebastien Buisson <sebastien.buisson@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	mm: mempolicy: skip inaccessible VMAs when setting MPOL_MF_LAZY	Mel Gorman	1	-1/+3
	PROT_NUMA VMAs are skipped to avoid problems distinguishing between present, prot_none and special entries. MPOL_MF_LAZY is not visible from userspace since commit a720094ded8c ("mm: mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now") but it should still skip VMAs the same way task_numa_work does. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	selftests/vm/transhuge-stress: stress test for memory compaction	Konstantin Khlebnikov	2	-0/+145
	This tool induces memory fragmentation via sequential allocation of transparent huge pages and splitting off everything except their last sub-pages. It easily generates pressure to the memory compaction code. $ perf stat -e 'compaction:' -e 'migrate:' ./transhuge-stress transhuge-stress: allocate 7858 transhuge pages, using 15716 MiB virtual memory and 61 MiB of ram transhuge-stress: 1.653 s/loop, 0.210 ms/page, 9504.828 MiB/s 7858 succeed, 0 failed, 2439 different pages transhuge-stress: 1.537 s/loop, 0.196 ms/page, 10226.227 MiB/s 7858 succeed, 0 failed, 2364 different pages transhuge-stress: 1.658 s/loop, 0.211 ms/page, 9479.215 MiB/s 7858 succeed, 0 failed, 2179 different pages transhuge-stress: 1.617 s/loop, 0.206 ms/page, 9716.992 MiB/s 7858 succeed, 0 failed, 2421 different pages ^C./transhuge-stress: Interrupt Performance counter stats for './transhuge-stress': 1.744.051 compaction:mm_compaction_isolate_migratepages 1.014 compaction:mm_compaction_isolate_freepages 1.744.051 compaction:mm_compaction_migratepages 1.647 compaction:mm_compaction_begin 1.647 compaction:mm_compaction_end 1.744.051 migrate:mm_migrate_pages 0 migrate:mm_numa_migrate_ratelimit 7,964696835 seconds time elapsed Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	mm/balloon_compaction: add vmstat counters and kpageflags bit	Konstantin Khlebnikov	11	-3/+37
	Always mark pages with PageBalloon even if balloon compaction is disabled and expose this mark in /proc/kpageflags as KPF_BALLOON. Also this patch adds three counters into /proc/vmstat: "balloon_inflate", "balloon_deflate" and "balloon_migrate". They accumulate balloon activity. Current size of balloon is (balloon_inflate - balloon_deflate) pages. All generic balloon code now gathered under option CONFIG_MEMORY_BALLOON. It should be selected by ballooning driver which wants use this feature. Currently virtio-balloon is the only user. Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	mm/balloon_compaction: remove balloon mapping and flag AS_BALLOON_MAP	Konstantin Khlebnikov	4	-206/+39
	Now ballooned pages are detected using PageBalloon(). Fake mapping is no longer required. This patch links ballooned pages to balloon device using field page->private instead of page->mapping. Also this patch embeds balloon_dev_info directly into struct virtio_balloon. Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	mm/balloon_compaction: redesign ballooned pages management	Konstantin Khlebnikov	7	-118/+68
	Sasha Levin reported KASAN splash inside isolate_migratepages_range(). Problem is in the function __is_movable_balloon_page() which tests AS_BALLOON_MAP in page->mapping->flags. This function has no protection against anonymous pages. As result it tried to check address space flags inside struct anon_vma. Further investigation shows more problems in current implementation: * Special branch in __unmap_and_move() never works: balloon_page_movable() checks page flags and page_count. In __unmap_and_move() page is locked, reference counter is elevated, thus balloon_page_movable() always fails. As a result execution goes to the normal migration path. virtballoon_migratepage() returns MIGRATEPAGE_BALLOON_SUCCESS instead of MIGRATEPAGE_SUCCESS, move_to_new_page() thinks this is an error code and assigns newpage->mapping to NULL. Newly migrated page lose connectivity with balloon an all ability for further migration. * lru_lock erroneously required in isolate_migratepages_range() for isolation ballooned page. This function releases lru_lock periodically, this makes migration mostly impossible for some pages. * balloon_page_dequeue have a tight race with balloon_page_isolate: balloon_page_isolate could be executed in parallel with dequeue between picking page from list and locking page_lock. Race is rare because they use trylock_page() for locking. This patch fixes all of them. Instead of fake mapping with special flag this patch uses special state of page->_mapcount: PAGE_BALLOON_MAPCOUNT_VALUE = -256. Buddy allocator uses PAGE_BUDDY_MAPCOUNT_VALUE = -128 for similar purpose. Storing mark directly in struct page makes everything safer and easier. PagePrivate is used to mark pages present in page list (i.e. not isolated, like PageLRU for normal pages). It replaces special rules for reference counter and makes balloon migration similar to migration of normal pages. This flag is protected by page_lock together with link to the balloon device. Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Link: http://lkml.kernel.org/p/53E6CEAA.9020105@oracle.com Cc: Rafael Aquini <aquini@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: <stable@vger.kernel.org> [3.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	arm64: mm: enable RCU fast_gup	Steve Capper	3	-1/+39
	Activate the RCU fast_gup for ARM64. We also need to force THP splits to broadcast an IPI s.t. we block in the fast_gup page walker. As THP splits are comparatively rare, this should not lead to a noticeable performance degradation. Some pre-requisite functions pud_write and pud_page are also added. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Steve Capper <steve.capper@linaro.org> Tested-by: Dann Frazier <dann.frazier@canonical.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Will Deacon <will.deacon@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	arm64: mm: enable HAVE_RCU_TABLE_FREE logic	Steve Capper	2	-3/+18
	In order to implement fast_get_user_pages we need to ensure that the page table walker is protected from page table pages being freed from under it. This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging to address spaces with multiple users will be call_rcu_sched freed. Meaning that disabling interrupts will block the free and protect the fast gup page walker. Signed-off-by: Steve Capper <steve.capper@linaro.org> Tested-by: Dann Frazier <dann.frazier@canonical.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Will Deacon <will.deacon@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	arm: mm: enable RCU fast_gup	Steve Capper	3	-0/+27
	Activate the RCU fast_gup for ARM. We also need to force THP splits to broadcast an IPI s.t. we block in the fast_gup page walker. As THP splits are comparatively rare, this should not lead to a noticeable performance degradation. Some pre-requisite functions pud_write and pud_page are also added. Signed-off-by: Steve Capper <steve.capper@linaro.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Will Deacon <will.deacon@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	arm: mm: enable HAVE_RCU_TABLE_FREE logic	Steve Capper	2	-2/+37
	In order to implement fast_get_user_pages we need to ensure that the page table walker is protected from page table pages being freed from under it. This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging to address spaces with multiple users will be call_rcu_sched freed. Meaning that disabling interrupts will block the free and protect the fast gup page walker. Signed-off-by: Steve Capper <steve.capper@linaro.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Will Deacon <will.deacon@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	arm: mm: introduce special ptes for LPAE	Steve Capper	3	-4/+11
	We need a mechanism to tag ptes as being special, this indicates that no attempt should be made to access the underlying struct page * associated with the pte. This is used by the fast_gup when operating on ptes as it has no means to access VMAs (that also contain this information) locklessly. The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies pte_special and pte_mkspecial to make use of it, and defines __HAVE_ARCH_PTE_SPECIAL. This patch also excludes special ptes from the icache/dcache sync logic. Signed-off-by: Steve Capper <steve.capper@linaro.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Will Deacon <will.deacon@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	mm: introduce a general RCU get_user_pages_fast()	Steve Capper	2	-0/+357
	This series implements general forms of get_user_pages_fast and __get_user_pages_fast in core code and activates them for arm and arm64. These are required for Transparent HugePages to function correctly, as a futex on a THP tail will otherwise result in an infinite loop (due to the core implementation of __get_user_pages_fast always returning 0). Unfortunately, a futex on THP tail can be quite common for certain workloads; thus THP is unreliable without a __get_user_pages_fast implementation. This series may also be beneficial for direct-IO heavy workloads and certain KVM workloads. This patch (of 6): get_user_pages_fast() attempts to pin user pages by walking the page tables directly and avoids taking locks. Thus the walker needs to be protected from page table pages being freed from under it, and needs to block any THP splits. One way to achieve this is to have the walker disable interrupts, and rely on IPIs from the TLB flushing code blocking before the page table pages are freed. On some platforms we have hardware broadcast of TLB invalidations, thus the TLB flushing code doesn't necessarily need to broadcast IPIs; and spuriously broadcasting IPIs can hurt system performance if done too often. This problem has been solved on PowerPC and Sparc by batching up page table pages belonging to more than one mm_user, then scheduling an rcu_sched callback to free the pages. This RCU page table free logic has been promoted to core code and is activated when one enables HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement their own get_user_pages_fast routines. The RCU page table free logic coupled with an IPI broadcast on THP split (which is a rare event), allows one to protect a page table walker by merely disabling the interrupts during the walk. This patch provides a general RCU implementation of get_user_pages_fast that can be used by architectures that perform hardware broadcast of TLB invalidations. It is based heavily on the PowerPC implementation by Nick Piggin. [akpm@linux-foundation.org: various comment fixes] Signed-off-by: Steve Capper <steve.capper@linaro.org> Tested-by: Dann Frazier <dann.frazier@canonical.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Will Deacon <will.deacon@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	mm/dmapool.c: fixed a brace coding style issue	Paul McQuade	1	-9/+6
	Remove 3 brace coding style for any arm of this statement Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	mm: ksm use pr_err instead of printk	Paul McQuade	1	-2/+2
	WARNING: Prefer: pr_err(... to printk(KERN_ERR ... [akpm@linux-foundation.org: remove KERN_ERR] Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	drivers/firmware/memmap.c: don't create memmap sysfs of same firmware_map_entry	Yasuaki Ishimatsu	1	-0/+3
	By the following commits, we prevented from allocating firmware_map_entry of same memory range: f0093ede: drivers/firmware/memmap.c: don't allocate firmware_map_entry of same memory range 49c8b24d: drivers/firmware/memmap.c: pass the correct argument to firmware_map_find_entry_bootmem() But it's not enough. When PNP0C80 device is added by acpi_scan_init(), memmap sysfses of same firmware_map_entry are created twice as follows: # cat /sys/firmware/memmap/*/start 0x40000000000 0x60000000000 0x4a837000 0x4a83a000 0x4a8b5000 ... 0x40000000000 0x60000000000 ... The flows of the issues are as follows: 1. e820_reserve_resources() allocates firmware_map_entrys of all memory ranges defined in e820. And, these firmware_map_entrys are linked with map_entries list. map_entries -> entry 1 -> ... -> entry N 2. When PNP0C80 device is limited by mem= boot option, acpi_scan_init() added the memory device. In this case, firmware_map_add_hotplug() allocates firmware_map_entry and creates memmap sysfs. map_entries -> entry 1 -> ... -> entry N -> entry N+1 \| memmap 1 3. firmware_memmap_init() creates memmap sysfses of firmware_map_entrys linked with map_entries. map_entries -> entry 1 -> ... -> entry N -> entry N+1 \| \| \| memmap 2 memmap N+1 memmap 1 memmap N+2 So while hot removing the PNP0C80 device, kernel panic occurs as follows: BUG: unable to handle kernel paging request at 00000001003e000b IP: sysfs_open_file+0x46/0x2b0 PGD 203a89fe067 PUD 0 Oops: 0000 [#1] SMP ... Call Trace: do_dentry_open+0x1ef/0x2a0 finish_open+0x31/0x40 do_last+0x57c/0x1220 path_openat+0xc2/0x4c0 do_filp_open+0x4b/0xb0 do_sys_open+0xf3/0x1f0 SyS_open+0x1e/0x20 system_call_fastpath+0x16/0x1b The patch adds a check of confirming whether memmap sysfs of firmware_map_entry has been created, and does not create memmap sysfs of same firmware_map_entry. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	mm/bootmem.c: use include/linux/ headers	Paul McQuade	1	-2/+2
	Replace asm. headers with linux/headers: <linux/bug.h> <linux/io.h> Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09	mm/filemap.c: remove trailing whitespace	Paul McQuade	1	-2/+2
	Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>