summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2013-11-21Merge branch 'akpm' (fixes from Andrew)Linus Torvalds16-124/+227
Merge patches from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: place page->pmd_huge_pte to right union MAINTAINERS: add keyboard driver to Hyper-V file list x86, mm: do not leak page->ptl for pmd page tables ipc,shm: correct error return value in shmctl (SHM_UNLOCK) mm, mempolicy: silence gcc warning block/partitions/efi.c: fix bound check ARM: drivers/rtc/rtc-at91rm9200.c: disable interrupts at shutdown mm: hugetlbfs: fix hugetlbfs optimization kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS cleanly ipc,shm: fix shm_file deletion races mm: thp: give transparent hugepage code a separate copy_page checkpatch: fix "Use of uninitialized value" warnings configfs: fix race between dentry put and lookup
2013-11-21Merge branch 'for-linus2' of ↵Linus Torvalds125-2058/+7712
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "In this patchset, we finally get an SELinux update, with Paul Moore taking over as maintainer of that code. Also a significant update for the Keys subsystem, as well as maintenance updates to Smack, IMA, TPM, and Apparmor" and since I wanted to know more about the updates to key handling, here's the explanation from David Howells on that: "Okay. There are a number of separate bits. I'll go over the big bits and the odd important other bit, most of the smaller bits are just fixes and cleanups. If you want the small bits accounting for, I can do that too. (1) Keyring capacity expansion. KEYS: Consolidate the concept of an 'index key' for key access KEYS: Introduce a search context structure KEYS: Search for auth-key by name rather than target key ID Add a generic associative array implementation. KEYS: Expand the capacity of a keyring Several of the patches are providing an expansion of the capacity of a keyring. Currently, the maximum size of a keyring payload is one page. Subtract a small header and then divide up into pointers, that only gives you ~500 pointers on an x86_64 box. However, since the NFS idmapper uses a keyring to store ID mapping data, that has proven to be insufficient to the cause. Whatever data structure I use to handle the keyring payload, it can only store pointers to keys, not the keys themselves because several keyrings may point to a single key. This precludes inserting, say, and rb_node struct into the key struct for this purpose. I could make an rbtree of records such that each record has an rb_node and a key pointer, but that would use four words of space per key stored in the keyring. It would, however, be able to use much existing code. I selected instead a non-rebalancing radix-tree type approach as that could have a better space-used/key-pointer ratio. I could have used the radix tree implementation that we already have and insert keys into it by their serial numbers, but that means any sort of search must iterate over the whole radix tree. Further, its nodes are a bit on the capacious side for what I want - especially given that key serial numbers are randomly allocated, thus leaving a lot of empty space in the tree. So what I have is an associative array that internally is a radix-tree with 16 pointers per node where the index key is constructed from the key type pointer and the key description. This means that an exact lookup by type+description is very fast as this tells us how to navigate directly to the target key. I made the data structure general in lib/assoc_array.c as far as it is concerned, its index key is just a sequence of bits that leads to a pointer. It's possible that someone else will be able to make use of it also. FS-Cache might, for example. (2) Mark keys as 'trusted' and keyrings as 'trusted only'. KEYS: verify a certificate is signed by a 'trusted' key KEYS: Make the system 'trusted' keyring viewable by userspace KEYS: Add a 'trusted' flag and a 'trusted only' flag KEYS: Separate the kernel signature checking keyring from module signing These patches allow keys carrying asymmetric public keys to be marked as being 'trusted' and allow keyrings to be marked as only permitting the addition or linkage of trusted keys. Keys loaded from hardware during kernel boot or compiled into the kernel during build are marked as being trusted automatically. New keys can be loaded at runtime with add_key(). They are checked against the system keyring contents and if their signatures can be validated with keys that are already marked trusted, then they are marked trusted also and can thus be added into the master keyring. Patches from Mimi Zohar make this usable with the IMA keyrings also. (3) Remove the date checks on the key used to validate a module signature. X.509: Remove certificate date checks It's not reasonable to reject a signature just because the key that it was generated with is no longer valid datewise - especially if the kernel hasn't yet managed to set the system clock when the first module is loaded - so just remove those checks. (4) Make it simpler to deal with additional X.509 being loaded into the kernel. KEYS: Load *.x509 files into kernel keyring KEYS: Have make canonicalise the paths of the X.509 certs better to deduplicate The builder of the kernel now just places files with the extension ".x509" into the kernel source or build trees and they're concatenated by the kernel build and stuffed into the appropriate section. (5) Add support for userspace kerberos to use keyrings. KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches KEYS: Implement a big key type that can save to tmpfs Fedora went to, by default, storing kerberos tickets and tokens in tmpfs. We looked at storing it in keyrings instead as that confers certain advantages such as tickets being automatically deleted after a certain amount of time and the ability for the kernel to get at these tokens more easily. To make this work, two things were needed: (a) A way for the tickets to persist beyond the lifetime of all a user's sessions so that cron-driven processes can still use them. The problem is that a user's session keyrings are deleted when the session that spawned them logs out and the user's user keyring is deleted when the UID is deleted (typically when the last log out happens), so neither of these places is suitable. I've added a system keyring into which a 'persistent' keyring is created for each UID on request. Each time a user requests their persistent keyring, the expiry time on it is set anew. If the user doesn't ask for it for, say, three days, the keyring is automatically expired and garbage collected using the existing gc. All the kerberos tokens it held are then also gc'd. (b) A key type that can hold really big tickets (up to 1MB in size). The problem is that Active Directory can return huge tickets with lots of auxiliary data attached. We don't, however, want to eat up huge tracts of unswappable kernel space for this, so if the ticket is greater than a certain size, we create a swappable shmem file and dump the contents in there and just live with the fact we then have an inode and a dentry overhead. If the ticket is smaller than that, we slap it in a kmalloc()'d buffer" * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (121 commits) KEYS: Fix keyring content gc scanner KEYS: Fix error handling in big_key instantiation KEYS: Fix UID check in keyctl_get_persistent() KEYS: The RSA public key algorithm needs to select MPILIB ima: define '_ima' as a builtin 'trusted' keyring ima: extend the measurement list to include the file signature kernel/system_certificate.S: use real contents instead of macro GLOBAL() KEYS: fix error return code in big_key_instantiate() KEYS: Fix keyring quota misaccounting on key replacement and unlink KEYS: Fix a race between negating a key and reading the error set KEYS: Make BIG_KEYS boolean apparmor: remove the "task" arg from may_change_ptraced_domain() apparmor: remove parent task info from audit logging apparmor: remove tsk field from the apparmor_audit_struct apparmor: fix capability to not use the current task, during reporting Smack: Ptrace access check mode ima: provide hash algo info in the xattr ima: enable support for larger default filedata hash algorithms ima: define kernel parameter 'ima_template=' to change configured default ima: add Kconfig default measurement list template ...
2013-11-21Merge git://git.infradead.org/users/eparis/auditLinus Torvalds12-113/+259
Pull audit updates from Eric Paris: "Nothing amazing. Formatting, small bug fixes, couple of fixes where we didn't get records due to some old VFS changes, and a change to how we collect execve info..." Fixed conflict in fs/exec.c as per Eric and linux-next. * git://git.infradead.org/users/eparis/audit: (28 commits) audit: fix type of sessionid in audit_set_loginuid() audit: call audit_bprm() only once to add AUDIT_EXECVE information audit: move audit_aux_data_execve contents into audit_context union audit: remove unused envc member of audit_aux_data_execve audit: Kill the unused struct audit_aux_data_capset audit: do not reject all AUDIT_INODE filter types audit: suppress stock memalloc failure warnings since already managed audit: log the audit_names record type audit: add child record before the create to handle case where create fails audit: use given values in tty_audit enable api audit: use nlmsg_len() to get message payload length audit: use memset instead of trying to initialize field by field audit: fix info leak in AUDIT_GET requests audit: update AUDIT_INODE filter rule to comparator function audit: audit feature to set loginuid immutable audit: audit feature to only allow unsetting the loginuid audit: allow unsetting the loginuid (with priv) audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE audit: loginuid functions coding style selinux: apply selinux checks on new audit message types ...
2013-11-21mm: place page->pmd_huge_pte to right unionKirill A. Shutemov1-3/+3
I don't know what went wrong, mis-merge or something, but ->pmd_huge_pte placed in wrong union within struct page. In original patch[1] it's placed to union with ->lru and ->slab, but in commit e009bb30c8df ("mm: implement split page table lock for PMD level") it's in union with ->index and ->freelist. That union seems also unused for pages with table tables and safe to re-use, but it's not what I've tested. Let's move it to original place. It fixes indentation at least. :) [1] https://lkml.org/lkml/2013/10/7/288 Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-21MAINTAINERS: add keyboard driver to Hyper-V file listHaiyang Zhang1-0/+1
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-21x86, mm: do not leak page->ptl for pmd page tablesKirill A. Shutemov2-4/+6
There are two code paths how page with pmd page table can be freed: pmd_free() and pmd_free_tlb(). I've missed the second one and didn't add page table destructor call there. It leads to leak of page->ptl for pmd page tables, if dynamically allocated page->ptl is in use. The patch adds the missed destructor and modifies documentation accordingly. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Andrey Vagin <avagin@openvz.org> Tested-by: Andrey Vagin <avagin@openvz.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-21ipc,shm: correct error return value in shmctl (SHM_UNLOCK)Jesper Nilsson1-3/+6
Commit 2caacaa82a51 ("ipc,shm: shorten critical region for shmctl") restructured the ipc shm to shorten critical region, but introduced a path where the return value could be -EPERM, even if the operation actually was performed. Before the commit, the err return value was reset by the return value from security_shm_shmctl() after the if (!ns_capable(...)) statement. Now, we still exit the if statement with err set to -EPERM, and in the case of SHM_UNLOCK, it is not reset at all, and used as the return value from shmctl. To fix this, we only set err when errors occur, leaving the fallthrough case alone. Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Michel Lespinasse <walken@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> [3.12.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-21mm, mempolicy: silence gcc warningDavid Rientjes1-1/+1
Fengguang Wu reports that compiling mm/mempolicy.c results in a warning: mm/mempolicy.c: In function 'mpol_to_str': mm/mempolicy.c:2878:2: error: format not a string literal and no format arguments Kees says this is because he is using -Wformat-security. Silence the warning. Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Suggested-by: Kees Cook <keescook@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-21block/partitions/efi.c: fix bound checkAntti P Miettinen1-2/+3
Use ARRAY_SIZE instead of sizeof to get proper max for label length. Since this is just a read out of bounds it's not that bad, but the problem becomes user-visible eg if one tries to use DEBUG_PAGEALLOC and DEBUG_RODATA, at least with some enhancements from Hiroshi. Of course the destination array can contain garbage when we read beyond the end of source array so that would be another user-visible problem. Signed-off-by: Antti P Miettinen <amiettinen@nvidia.com> Reviewed-by: Hiroshi Doyu <hdoyu@nvidia.com> Tested-by: Hiroshi Doyu <hdoyu@nvidia.com> Cc: Will Drewry <wad@chromium.org> Cc: Matt Fleming <matt.fleming@intel.com> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-21ARM: drivers/rtc/rtc-at91rm9200.c: disable interrupts at shutdownJohan Hovold1-0/+9
Make sure RTC-interrupts are disabled at shutdown. As the RTC is generally powered by backup power (VDDBU), its interrupts are not disabled on wake-up, user, watchdog or software reset. This could cause troubles on other systems (e.g. older kernels) if an interrupt occurs before a handler has been installed at next boot. Let us be well-behaved and disable them on clean shutdowns at least (as do the RTT-based rtc-at91sam9 driver). Signed-off-by: Johan Hovold <jhovold@gmail.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Andrew Victor <linux@maxim.org.za> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-21mm: hugetlbfs: fix hugetlbfs optimizationAndrea Arcangeli3-60/+106
Commit 7cb2ef56e6a8 ("mm: fix aio performance regression for database caused by THP") can cause dereference of a dangling pointer if split_huge_page runs during PageHuge() if there are updates to the tail_page->private field. Also it is repeating compound_head twice for hugetlbfs and it is running compound_head+compound_trans_head for THP when a single one is needed in both cases. The new code within the PageSlab() check doesn't need to verify that the THP page size is never bigger than the smallest hugetlbfs page size, to avoid memory corruption. A longstanding theoretical race condition was found while fixing the above (see the change right after the skip_unlock label, that is relevant for the compound_lock path too). By re-establishing the _mapcount tail refcounting for all compound pages, this also fixes the below problem: echo 0 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages BUG: Bad page state in process bash pfn:59a01 page:ffffea000139b038 count:0 mapcount:10 mapping: (null) index:0x0 page flags: 0x1c00000000008000(tail) Modules linked in: CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x55/0x76 bad_page+0xd5/0x130 free_pages_prepare+0x213/0x280 __free_pages+0x36/0x80 update_and_free_page+0xc1/0xd0 free_pool_huge_page+0xc2/0xe0 set_max_huge_pages.part.58+0x14c/0x220 nr_hugepages_store_common.isra.60+0xd0/0xf0 nr_hugepages_store+0x13/0x20 kobj_attr_store+0xf/0x20 sysfs_write_file+0x189/0x1e0 vfs_write+0xc5/0x1f0 SyS_write+0x55/0xb0 system_call_fastpath+0x16/0x1b Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Pravin Shelar <pshelar@nicira.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-21kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS cleanlyYuanhan Liu2-6/+6
Remove CONFIG_USE_GENERIC_SMP_HELPERS left by commit 0a06ff068f12 ("kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS"). Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-21ipc,shm: fix shm_file deletion racesGreg Thelen1-5/+23
When IPC_RMID races with other shm operations there's potential for use-after-free of the shm object's associated file (shm_file). Here's the race before this patch: TASK 1 TASK 2 ------ ------ shm_rmid() ipc_lock_object() shmctl() shp = shm_obtain_object_check() shm_destroy() shum_unlock() fput(shp->shm_file) ipc_lock_object() shmem_lock(shp->shm_file) <OOPS> The oops is caused because shm_destroy() calls fput() after dropping the ipc_lock. fput() clears the file's f_inode, f_path.dentry, and f_path.mnt, which causes various NULL pointer references in task 2. I reliably see the oops in task 2 if with shmlock, shmu This patch fixes the races by: 1) set shm_file=NULL in shm_destroy() while holding ipc_object_lock(). 2) modify at risk operations to check shm_file while holding ipc_object_lock(). Example workloads, which each trigger oops... Workload 1: while true; do id=$(shmget 1 4096) shm_rmid $id & shmlock $id & wait done The oops stack shows accessing NULL f_inode due to racing fput: _raw_spin_lock shmem_lock SyS_shmctl Workload 2: while true; do id=$(shmget 1 4096) shmat $id 4096 & shm_rmid $id & wait done The oops stack is similar to workload 1 due to NULL f_inode: touch_atime shmem_mmap shm_mmap mmap_region do_mmap_pgoff do_shmat SyS_shmat Workload 3: while true; do id=$(shmget 1 4096) shmlock $id shm_rmid $id & shmunlock $id & wait done The oops stack shows second fput tripping on an NULL f_inode. The first fput() completed via from shm_destroy(), but a racing thread did a get_file() and queued this fput(): locks_remove_flock __fput ____fput task_work_run do_notify_resume int_signal Fixes: c2c737a0461e ("ipc,shm: shorten critical region for shmat") Fixes: 2caacaa82a51 ("ipc,shm: shorten critical region for shmctl") Signed-off-by: Greg Thelen <gthelen@google.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> # 3.10.17+ 3.11.6+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-21mm: thp: give transparent hugepage code a separate copy_pageDave Hansen3-38/+48
Right now, the migration code in migrate_page_copy() uses copy_huge_page() for hugetlbfs and thp pages: if (PageHuge(page) || PageTransHuge(page)) copy_huge_page(newpage, page); So, yay for code reuse. But: void copy_huge_page(struct page *dst, struct page *src) { struct hstate *h = page_hstate(src); and a non-hugetlbfs page has no page_hstate(). This works 99% of the time because page_hstate() determines the hstate from the page order alone. Since the page order of a THP page matches the default hugetlbfs page order, it works. But, if you change the default huge page size on the boot command-line (say default_hugepagesz=1G), then we might not even *have* a 2MB hstate so page_hstate() returns null and copy_huge_page() oopses pretty fast since copy_huge_page() dereferences the hstate: void copy_huge_page(struct page *dst, struct page *src) { struct hstate *h = page_hstate(src); if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) { ... Mel noticed that the migration code is really the only user of these functions. This moves all the copy code over to migrate.c and makes copy_huge_page() work for THP by checking for it explicitly. I believe the bug was introduced in commit b32967ff101a ("mm: numa: Add THP migration for the NUMA working set scanning fault case") [akpm@linux-foundation.org: fix coding-style and comment text, per Naoya Horiguchi] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-21checkpatch: fix "Use of uninitialized value" warningsJoe Perches1-0/+1
checkpatch is currently confused about some complex macros and references undefined variables $stat and $cond. Make sure these are defined before using them. Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Gerhard Sittig <gsi@denx.de> Acked-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-21configfs: fix race between dentry put and lookupJunxiao Bi1-2/+14
A race window in configfs, it starts from one dentry is UNHASHED and end before configfs_d_iput is called. In this window, if a lookup happen, since the original dentry was UNHASHED, so a new dentry will be allocated, and then in configfs_attach_attr(), sd->s_dentry will be updated to the new dentry. Then in configfs_d_iput(), BUG_ON(sd->s_dentry != dentry) will be triggered and system panic. sys_open: sys_close: ... fput dput dentry_kill __d_drop <--- dentry unhashed here, but sd->dentry still point to this dentry. lookup_real configfs_lookup configfs_attach_attr---> update sd->s_dentry to new allocated dentry here. d_kill configfs_d_iput <--- BUG_ON(sd->s_dentry != dentry) triggered here. To fix it, change configfs_d_iput to not update sd->s_dentry if sd->s_count > 2, that means there are another dentry is using the sd beside the one that is going to be put. Use configfs_dirent_lock in configfs_attach_attr to sync with configfs_d_iput. With the following steps, you can reproduce the bug. 1. enable ocfs2, this will mount configfs at /sys/kernel/config and fill configure in it. 2. run the following script. while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done & while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done & Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-20Merge branch 'next' of ↵Linus Torvalds13-34/+542
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc LE updates from Ben Herrenschmidt: "With my previous pull request I mentioned some remaining Little Endian patches, notably support for our new ABI, which I was sitting on making sure it was all finalized. The toolchain folks confirmed it now, the new ABI is stable and merged with gcc, so we are all good. Oh and we actually missed the actual Kconfig switch for LE so here it is, along with a couple more bug fixes. I have more fixes but not related to LE so I'll send them as a separate pull request tomorrow, let's get this one out of the way. Note that this supports running user space binaries using the new ABI, but the kernel itself still needs to be built with the old one. We'll bring fixes for that after -rc1. Here's Anton log that goes with this series: This patch series adds support for the new ABI, LPAR support for H_SET_MODE and finally adds a kconfig option and defconfig. ABIv2 support was recently committed to binutils and gcc, and should be merged into glibc soon. There are a number of very nice improvements including the removal of function descriptors. Rusty's kernel patches allow binaries of either ABI to work, easing the transition" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc: Wrong DWARF CFI in the kernel vdso for little-endian / ELFv2 powerpc: Add pseries_le_defconfig powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option. powerpc: Don't use ELFv2 ABI to build the kernel powerpc: ELF2 binaries signal handling powerpc: ELF2 binaries launched directly. powerpc: Set eflags correctly for ELF ABIv2 core dumps. powerpc: Add TIF_ELF2ABI flag. pseries: Add H_SET_MODE to change exception endianness powerpc/pseries: Fix endian issues in pseries EEH code
2013-11-20Merge branch 'for-linus' of ↵Linus Torvalds24-388/+777
git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha Pull alpha updates from Matt Turner: "It contains a few fixes and some work from Richard to make alpha emulation under QEMU much more usable" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha: alpha: Prevent a NULL ptr dereference in csum_partial_copy. alpha: perf: fix out-of-bounds array access triggered from raw event alpha: Use qemu+cserve provided high-res clock and alarm. alpha: Switch to GENERIC_CLOCKEVENTS alpha: Enable the rpcc clocksource for single processor alpha: Reorganize rtc handling alpha: Primitive support for CPU power down. alpha: Allow HZ to be configured alpha: Notice if we're being run under QEMU alpha: Eliminate compiler warning from memset macro
2013-11-20Merge branch 'parisc-3.13' of ↵Linus Torvalds5-55/+41
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: - revert an access_ok() patch which broke 32bit userspace on 64bit kernels - avoid a gcc miscompilation in two internal pa_memcpy() functions by not inlining those - do not export the definition of SOCK_NONBLOCK via uapi header (fixes build of audit package) - depending on the fault type we now correctly report either SIGBUS or SIGSEGV - a small fix to not compare a size_t variable for < 0 * 'parisc-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: size_t is unsigned, so comparison size < 0 doesn't make sense. parisc: improve SIGBUS/SIGSEGV error reporting parisc: break out SOCK_NONBLOCK define to own asm header file parisc: do not inline pa_memcpy() internal functions Revert "parisc: implement full version of access_ok()"
2013-11-20Merge branch 'for-linus' of ↵Linus Torvalds35-129/+101
git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32 Pull AVR32 updates from Hans-Christian Egtvedt. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32: avr32: uapi: be sure of "_UAPI" prefix for all guard macros avr32: add kprobe_ctlblk memory struct avr32: fix out-of-range jump in large kernels avr32: setup crt for early panic()
2013-11-20Merge tag 'squashfs-updates' of ↵Linus Torvalds20-243/+1145
git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next Pull squashfs updates from Phillip Lougher: "These patches optionally improve the multi-threading peformance of Squashfs by adding parallel decompression, and direct decompression into the page cache, eliminating an intermediate buffer (removing memcpy overhead and lock contention)" * tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next: Squashfs: Check stream is not NULL in decompressor_multi.c Squashfs: Directly decompress into the page cache for file data Squashfs: Restructure squashfs_readpage() Squashfs: Generalise paging handling in the decompressors Squashfs: add multi-threaded decompression using percpu variable squashfs: Enhance parallel I/O Squashfs: Refactor decompressor interface and code
2013-11-20Revert "mm: create a separate slab for page->ptl allocation"Linus Torvalds3-17/+1
This reverts commit ea1e7ed33708c7a760419ff9ded0a6cb90586a50. Al points out that while the commit *does* actually create a separate slab for the page->ptl allocation, that slab is never actually used, and the code continues to use kmalloc/kfree. Damien Wyart points out that the original patch did have the conversion to use kmem_cache_alloc/free, so it got lost somewhere on its way to me. Revert the half-arsed attempt that didn't do anything. If we really do want the special slab (remember: this is all relevant just for debug builds, so it's not necessarily all that critical) we might as well redo the patch fully. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Andrew Morton <akpm@linux-foundation.org> Cc: Kirill A Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-20Merge branch 'for-linus' of ↵Linus Torvalds17-187/+86
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs bits and pieces from Al Viro: "Assorted bits that got missed in the first pull request + fixes for a couple of coredump regressions" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fold try_to_ascend() into the sole remaining caller dcache.c: get rid of pointless macros take read_seqbegin_or_lock() and friends to seqlock.h consolidate simple ->d_delete() instances gfs2: endianness misannotations dump_emit(): use __kernel_write(), not vfs_write() dump_align(): fix the dumb braino
2013-11-20Wrong page freed on preallocate_pmds() failure exitAl Viro1-1/+1
Note that pmds[i] is simply uninitialized at that point... Granted, it's very hard to hit (you need split page locks *and* kmalloc(sizeof(spinlock_t), GFP_KERNEL) failing), but the code is obviously bogus. Introduced by commit 09ef4939850a ("x86: add missed pgtable_pmd_page_ctor/dtor calls for preallocated pmds") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-21powerpc: Wrong DWARF CFI in the kernel vdso for little-endian / ELFv2Ulrich Weigand1-1/+15
I've finally tracked down why my CR signal-unwind test case still fails on little-endian. The problem turned to be that the kernel installs a signal trampoline in the vDSO, and provides a DWARF CFI record for that trampoline. This CFI describes the save location for CR: rsave (70, 38*RSIZE + (RSIZE - CRSIZE)) which is correct for big-endian, but points to the wrong word on little-endian. This is wrong no matter which ABI. In addition, for the ELFv2 ABI, we should not only provide a CFI record for register 70 (cr2), but for all CR fields separately. Strictly speaking, I guess this would mean providing two separate vDSO images, one for ELFv1 processes and one for ELFv2 processes (or maybe playing some tricks with conditional DWARF expressions). However, having CFI records for the other CR fields in ELFv1 is not actually wrong, they just will be ignored. So it seems the simplest fix would be just to always provide CFI for all the fields. Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-11-21powerpc: Add pseries_le_defconfigAnton Blanchard1-0/+352
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-11-21powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.Anton Blanchard1-0/+11
With the little endian support merged, we can add the CONFIG_CPU_LITTLE_ENDIAN kernel config option. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-11-21powerpc: Don't use ELFv2 ABI to build the kernelAlistair Popple1-0/+1
The kernel doesn't build correctly using the ELFv2 ABI. This patch ensures that the ELFv1 ABI is used when building a kernel with an ELFv2 enabled compiler. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-11-21powerpc: ELF2 binaries signal handlingRusty Russell1-9/+16
For the ELFv2 ABI, the hander is the entry point, not a function descriptor. We also need to set up r12, and fortunately the fast_exception_return exit path restores r12 for us so nothing else is required. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-11-21powerpc: ELF2 binaries launched directly.Rusty Russell1-15/+35
No function descriptor, but we set r12 up and set TIF_RESTOREALL as it normally isn't restored on return from syscall. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-11-21powerpc: Set eflags correctly for ELF ABIv2 core dumps.Rusty Russell1-0/+2
We leave it at zero (though it could be 1) for old tasks. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-11-21powerpc: Add TIF_ELF2ABI flag.Rusty Russell2-0/+11
Little endian ppc64 is getting an exciting new ABI. This is reflected by the bottom two bits of e_flags in the ELF header: 0 == legacy binaries (v1 ABI) 1 == binaries using the old ABI (compiled with a new toolchain) 2 == binaries using the new ABI. We store this in a thread flag, because we need to set it in core dumps and for signal delivery. Our chief concern is that it doesn't use function descriptors. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-11-21pseries: Add H_SET_MODE to change exception endiannessAnton Blanchard4-0/+87
On little endian builds call H_SET_MODE so exceptions have the correct endianness. We need to reset the endian during kexec so do that in the MMU hashtable clear callback. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-11-21powerpc/pseries: Fix endian issues in pseries EEH codeAnton Blanchard1-9/+12
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-11-20Merge tag 'pm+acpi-2-3.13-rc1' of ↵Linus Torvalds58-328/+374
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI and power management updates from Rafael Wysocki: - ACPI-based device hotplug fixes for issues introduced recently and a fix for an older error code path bug in the ACPI PCI host bridge driver - Fix for recently broken OMAP cpufreq build from Viresh Kumar - Fix for a recent hibernation regression related to s2disk - Fix for a locking-related regression in the ACPI EC driver from Puneet Kumar - System suspend error code path fix related to runtime PM and runtime PM documentation update from Ulf Hansson - cpufreq's conservative governor fix from Xiaoguang Chen - New processor IDs for intel_idle and turbostat and removal of an obsolete Kconfig option from Len Brown - New device IDs for the ACPI LPSS (Low-Power Subsystem) driver and ACPI-based PCI hotplug (ACPIPHP) cleanup from Mika Westerberg - Removal of several ACPI video DMI blacklist entries that are not necessary any more from Aaron Lu - Rework of the ACPI companion representation in struct device and code cleanup related to that change from Rafael J Wysocki, Lan Tianyu and Jarkko Nikula - Fixes for assigning names to ACPI-enumerated I2C and SPI devices from Jarkko Nikula * tag 'pm+acpi-2-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits) PCI / hotplug / ACPI: Drop unused acpiphp_debug declaration ACPI / scan: Set flags.match_driver in acpi_bus_scan_fixed() ACPI / PCI root: Clear driver_data before failing enumeration ACPI / hotplug: Fix PCI host bridge hot removal ACPI / hotplug: Fix acpi_bus_get_device() return value check cpufreq: governor: Remove fossil comment in the cpufreq_governor_dbs() ACPI / video: clean up DMI table for initial black screen problem ACPI / EC: Ensure lock is acquired before accessing ec struct members PM / Hibernate: Do not crash kernel in free_basic_memory_bitmaps() ACPI / AC: Remove struct acpi_device pointer from struct acpi_ac spi: Use stable dev_name for ACPI enumerated SPI slaves i2c: Use stable dev_name for ACPI enumerated I2C slaves ACPI: Provide acpi_dev_name accessor for struct acpi_device device name ACPI / bind: Use (put|get)_device() on ACPI device objects too ACPI: Eliminate the DEVICE_ACPI_HANDLE() macro ACPI / driver core: Store an ACPI device pointer in struct acpi_dev_node cpufreq: OMAP: Fix compilation error 'r & ret undeclared' PM / Runtime: Fix error path for prepare PM / Runtime: Update documentation around probe|remove|suspend cpufreq: conservative: set requested_freq to policy max when it is over policy max ...
2013-11-20Merge branch 'next' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds71-2272/+1979
Pull slave-dmaengine changes from Vinod Koul: "This brings for slave dmaengine: - Change dma notification flag to DMA_COMPLETE from DMA_SUCCESS as dmaengine can only transfer and not verify validaty of dma transfers - Bunch of fixes across drivers: - cppi41 driver fixes from Daniel - 8 channel freescale dma engine support and updated bindings from Hongbo - msx-dma fixes and cleanup by Markus - DMAengine updates from Dan: - Bartlomiej and Dan finalized a rework of the dma address unmap implementation. - In the course of testing 1/ a collection of enhancements to dmatest fell out. Notably basic performance statistics, and fixed / enhanced test control through new module parameters 'run', 'wait', 'noverify', and 'verbose'. Thanks to Andriy and Linus [Walleij] for their review. - Testing the raid related corner cases of 1/ triggered bugs in the recently added 16-source operation support in the ioatdma driver. - Some minor fixes / cleanups to mv_xor and ioatdma" * 'next' of git://git.infradead.org/users/vkoul/slave-dma: (99 commits) dma: mv_xor: Fix mis-usage of mmio 'base' and 'high_base' registers dma: mv_xor: Remove unneeded NULL address check ioat: fix ioat3_irq_reinit ioat: kill msix_single_vector support raid6test: add new corner case for ioatdma driver ioatdma: clean up sed pool kmem_cache ioatdma: fix selection of 16 vs 8 source path ioatdma: fix sed pool selection ioatdma: Fix bug in selftest after removal of DMA_MEMSET. dmatest: verbose mode dmatest: convert to dmaengine_unmap_data dmatest: add a 'wait' parameter dmatest: add basic performance metrics dmatest: add support for skipping verification and random data setup dmatest: use pseudo random numbers dmatest: support xor-only, or pq-only channels in tests dmatest: restore ability to start test at module load and init dmatest: cleanup redundant "dmatest: " prefixes dmatest: replace stored results mechanism, with uniform messages Revert "dmatest: append verify result to results" ...
2013-11-20Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds4-7/+17
Pull block IO fixes from Jens Axboe: "Normally I'd defer my initial for-linus pull request until after the merge window, but a race was uncovered in the virtio-blk conversion to blk-mq that could cause hangs. So here's a small collection of fixes for you to pull: - The fix for the virtio-blk IO hang reported by Dave Chinner, from Shaohua and myself. - Add the Insert blktrace event for blk-mq. This makes 'btt' happy when it is doing it's state transition analysis. - Ensure that blk-mq has disk/partition stats enabled by default, instead of making it opt-in. - A fix for __bio_add_page() and large sector counts" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: add blktrace insert event trace virtio-blk: virtqueue_kick() must be ordered with other virtqueue operations blk-mq: ensure that we set REQ_IO_STAT so diskstats work bio: fix argument of __bio_add_page() for max_sectors > 0xffff
2013-11-20Merge tag 'md/3.13' of git://neil.brown.name/mdLinus Torvalds8-186/+592
Pull md update from Neil Brown: "Mostly optimisations and obscure bug fixes. - raid5 gets less lock contention - raid1 gets less contention between normal-io and resync-io during resync" * tag 'md/3.13' of git://neil.brown.name/md: md/raid5: Use conf->device_lock protect changing of multi-thread resources. md/raid5: Before freeing old multi-thread worker, it should flush them. md/raid5: For stripe with R5_ReadNoMerge, we replace REQ_FLUSH with REQ_NOMERGE. UAPI: include <asm/byteorder.h> in linux/raid/md_p.h raid1: Rewrite the implementation of iobarrier. raid1: Add some macros to make code clearly. raid1: Replace raise_barrier/lower_barrier with freeze_array/unfreeze_array when reconfiguring the array. raid1: Add a field array_frozen to indicate whether raid in freeze state. md: Convert use of typedef ctl_table to struct ctl_table md/raid5: avoid deadlock when raid5 array has unack badblocks during md_stop_writes. md: use MD_RECOVERY_INTR instead of kthread_should_stop in resync thread. md: fix some places where mddev_lock return value is not checked. raid5: Retry R5_ReadNoMerge flag when hit a read error. raid5: relieve lock contention in get_active_stripe() raid5: relieve lock contention in get_active_stripe() wait: add wait_event_cmd() md/raid5.c: add proper locking to error path of raid5_start_reshape. md: fix calculation of stacking limits on level change. raid5: Use slow_path to release stripe when mddev->thread is null
2013-11-20avr32: uapi: be sure of "_UAPI" prefix for all guard macrosChen Gang31-102/+56
For all uapi headers, need use "_UAPI" prefix for its guard macro (which will be stripped by "scripts/headers_installer.sh"). Also remove redundant files (bitsperlong.h, errno.h, fcntl.h, ioctl.h, ioctls.h, ipcbuf.h, kvm_para.h, mman.h, poll.h, resource.h, siginfo.h, statfs.h, and unistd.h) which are already in Kbuild. Also be sure that all "#endif" only have one empty line above, and each file has guard macro. Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Hans-Christian Egtvedt <hegtvedt@cisco.com>
2013-11-20avr32: add kprobe_ctlblk memory structEirik Aanonsen1-0/+14
This re-enables kprobes on AVR32 architecture. Signed-off-by: Eirik Aanonsen <eaa@wprmedical.com> Signed-off-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
2013-11-20avr32: fix out-of-range jump in large kernelsAndreas Bießmann2-2/+6
This patch fixes following error (for big kernels): ---8<--- arch/avr32/boot/u-boot/head.o: In function `no_tag_table': (.init.text+0x44): relocation truncated to fit: R_AVR32_22H_PCREL against symbol `panic' defined in .text.unlikely section in kernel/built-in.o arch/avr32/kernel/built-in.o: In function `bad_return': (.ex.text+0x236): relocation truncated to fit: R_AVR32_22H_PCREL against symbol `panic' defined in .text.unlikely section in kernel/built-in.o --->8--- It comes up when the kernel increases and 'panic()' is too far away to fit in the +/- 2MiB range. Which in turn issues from the 21-bit displacement in 'br{cond4}' mnemonic which is one of the two ways to do jumps (rjmp has just 10-bit displacement and therefore a way smaller range). This fact was stated before in 8d29b7b9f81d6b83d869ff054e6c189d6da73f1f. One solution to solve this is to add a local storage for the symbol address and just load the $pc with that value. Signed-off-by: Andreas Bießmann <andreas@biessmann.de> Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: stable@vger.kernel.org
2013-11-20avr32: setup crt for early panic()Andreas Bießmann2-25/+25
Before the CRT was (fully) set up in kernel_entry (bss cleared before in _start, but also not before jump to panic() in no_tag_table case). This patch fixes this up to have a fully working CRT when branching to panic() in no_tag_table. Signed-off-by: Andreas Bießmann <andreas@biessmann.de> Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: stable@vger.kernel.org
2013-11-20Squashfs: Check stream is not NULL in decompressor_multi.cPhillip Lougher1-4/+3
Fix static checker complaint that stream is not checked in squashfs_decompressor_destroy(). Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
2013-11-20Squashfs: Directly decompress into the page cache for file dataPhillip Lougher5-1/+336
This introduces an implementation of squashfs_readpage_block() that directly decompresses into the page cache. This uses the previously added page handler abstraction to push down the necessary kmap_atomic/kunmap_atomic operations on the page cache buffers into the decompressors. This enables direct copying into the page cache without using the slow kmap/kunmap calls. The code detects when multiple threads are racing in squashfs_readpage() to decompress the same block, and avoids this regression by falling back to using an intermediate buffer. This patch enhances the performance of Squashfs significantly when multiple processes are accessing the filesystem simultaneously because it not only reduces memcopying, but it more importantly eliminates the lock contention on the intermediate buffer. Using single-thread decompression. dd if=file1 of=/dev/null bs=4096 & dd if=file2 of=/dev/null bs=4096 & dd if=file3 of=/dev/null bs=4096 & dd if=file4 of=/dev/null bs=4096 Before: 629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s After: 629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
2013-11-20Squashfs: Restructure squashfs_readpage()Phillip Lougher4-71/+118
Restructure squashfs_readpage() splitting it into separate functions for datablocks, fragments and sparse blocks. Move the memcpying (from squashfs cache entry) implementation of squashfs_readpage_block into file_cache.c This allows different implementations to be supported. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
2013-11-20Squashfs: Generalise paging handling in the decompressorsPhillip Lougher13-67/+163
Further generalise the decompressors by adding a page handler abstraction. This adds helpers to allow the decompressors to access and process the output buffers in an implementation independant manner. This allows different types of output buffer to be passed to the decompressors, with the implementation specific aspects handled at decompression time, but without the knowledge being held in the decompressor wrapper code. This will allow the decompressors to handle Squashfs cache buffers, and page cache pages. This patch adds the abstraction and an implementation for the caches. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
2013-11-20Squashfs: add multi-threaded decompression using percpu variablePhillip Lougher3-20/+145
Add a multi-threaded decompression implementation which uses percpu variables. Using percpu variables has advantages and disadvantages over implementations which do not use percpu variables. Advantages: * the nature of percpu variables ensures decompression is load-balanced across the multiple cores. * simplicity. Disadvantages: it limits decompression to one thread per core. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
2013-11-20squashfs: Enhance parallel I/OMinchan Kim3-1/+221
Now squashfs have used for only one stream buffer for decompression so it hurts parallel read performance so this patch supports multiple decompressor to enhance performance parallel I/O. Four 1G file dd read on KVM machine which has 2 CPU and 4G memory. dd if=test/test1.dat of=/dev/null & dd if=test/test2.dat of=/dev/null & dd if=test/test3.dat of=/dev/null & dd if=test/test4.dat of=/dev/null & old : 1m39s -> new : 9s * From v1 * Change comp_strm with decomp_strm - Phillip * Change/add comments - Phillip Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
2013-11-20Squashfs: Refactor decompressor interface and codePhillip Lougher11-136/+216
The decompressor interface and code was written from the point of view of single-threaded operation. In doing so it mixed a lot of single-threaded implementation specific aspects into the decompressor code and elsewhere which makes it difficult to seamlessly support multiple different decompressor implementations. This patch does the following: 1. It removes compressor_options parsing from the decompressor init() function. This allows the decompressor init() function to be dynamically called to instantiate multiple decompressors, without the compressor options needing to be read and parsed each time. 2. It moves threading and all sleeping operations out of the decompressors. In doing so, it makes the decompressors non-blocking wrappers which only deal with interfacing with the decompressor implementation. 3. It splits decompressor.[ch] into decompressor generic functions in decompressor.[ch], and moves the single threaded decompressor implementation into decompressor_single.c. The result of this patch is Squashfs should now be able to support multiple decompressors by adding new decompressor_xxx.c files with specialised implementations of the functions in decompressor_single.c Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
2013-11-19blk-mq: add blktrace insert event traceJens Axboe1-0/+2
We need it to make 'btt' from blktrace happy, otherwise we are missing one state transition. Signed-off-by: Jens Axboe <axboe@kernel.dk>