summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/mm/slice.c
AgeCommit message (Collapse)AuthorFilesLines
2017-11-20powerpc/64s/slice: Use addr limit when computing slice maskAneesh Kumar K.V1-12/+22
While computing slice mask for the free area we need make sure we only search in the addr limit applicable for this mmap. We update the slb_addr_limit after we request for a mmap above 128TB. But the following mmap request with hint addr below 128TB should still limit its search to below 128TB. ie. we should not use slb_addr_limit to compute slice mask in this case. Instead, we should derive high addr limit based on the mmap hint addr value. Fixes: f4ea6dcb08ea ("powerpc/mm: Enable mappings above 128TB") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-13powerpc/64s: mm_context.addr_limit is only used on hashNicholas Piggin1-11/+11
Radix keeps no meaningful state in addr_limit, so remove it from radix code and rename to slb_addr_limit to make it clear it applies to hash only. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-13powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundaryNicholas Piggin1-1/+1
While mapping hints with a length that cross 128TB are disallowed, MAP_FIXED allocations that cross 128TB are allowed. These are failing on hash (on radix they succeed). Add an additional case for fixed mappings to expand the addr_limit when crossing 128TB. Fixes: f4ea6dcb08ea ("powerpc/mm: Enable mappings above 128TB") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-13powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocationNicholas Piggin1-26/+24
When allocating VA space with a hint that crosses 128TB, the SLB addr_limit variable is not expanded if addr is not > 128TB, but the slice allocation looks at task_size, which is 512TB. This results in slice_check_fit() incorrectly succeeding because the slice_count truncates off bit 128 of the requested mask, so the comparison to the available mask succeeds. Fix this by using mm->context.addr_limit instead of mm->task_size for testing allocation limits. This causes such allocations to fail. Fixes: f4ea6dcb08ea ("powerpc/mm: Enable mappings above 128TB") Cc: stable@vger.kernel.org # v4.12+ Reported-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-13powerpc/64s/hash: Fix 512T hint detection to use >= 128TMichael Ellerman1-2/+2
Currently userspace is able to request mmap() search between 128T-512T by specifying a hint address that is greater than 128T. But that means a hint of 128T exactly will return an address below 128T, which is confusing and wrong. So fix the logic to check the hint is greater than *or equal* to 128T. Fixes: f4ea6dcb08ea ("powerpc/mm: Enable mappings above 128TB") Cc: stable@vger.kernel.org # v4.12+ Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Suggested-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of Nick's bigger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19mm: larger stack guard gap, between vmasHugh Dickins1-1/+1
Stack guard page is a useful feature to reduce a risk of stack smashing into a different mapping. We have been using a single page gap which is sufficient to prevent having stack adjacent to a different mapping. But this seems to be insufficient in the light of the stack usage in userspace. E.g. glibc uses as large as 64kB alloca() in many commonly used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX] which is 256kB or stack strings with MAX_ARG_STRLEN. This will become especially dangerous for suid binaries and the default no limit for the stack size limit because those applications can be tricked to consume a large portion of the stack and a single glibc call could jump over the guard page. These attacks are not theoretical, unfortunatelly. Make those attacks less probable by increasing the stack guard gap to 1MB (on systems with 4k pages; but make it depend on the page size because systems with larger base pages might cap stack allocations in the PAGE_SIZE units) which should cover larger alloca() and VLA stack allocations. It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot. One could argue that the gap size should be configurable from userspace, but that can be done later when somebody finds that the new 1MB is wrong for some special case applications. For now, add a kernel command line option (stack_guard_gap) to specify the stack gap size (in page units). Implementation wise, first delete all the old code for stack guard page: because although we could get away with accounting one extra page in a stack vma, accounting a larger gap can break userspace - case in point, a program run with "ulimit -S -v 20000" failed when the 1MB gap was counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK and strict non-overcommit mode. Instead of keeping gap inside the stack vma, maintain the stack guard gap as a gap between vmas: using vm_start_gap() in place of vm_start (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few places which need to respect the gap - mainly arch_get_unmapped_area(), and and the vma tree's subtree_gap support for that. Original-patch-by: Oleg Nesterov <oleg@redhat.com> Original-patch-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Tested-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-19powerpc/mmap: Any hint > 128TB searches the full VA spaceAneesh Kumar K.V1-1/+2
As part of the new large address space support, processes start out life with a 128TB virtual address space. However when calling mmap() a process can pass a hint address, and if that hint is > 128TB the kernel will use the full 512TB address space to try and satisfy the mmap() request. Currently we have a check that the hint is > 128TB and < 512TB (TASK_SIZE), which was added as an optimisation to avoid updating addr_limit unnecessarily and also to avoid calling slice_flush_segments() on all CPUs more than necessary. However this has the user-visible side effect that an mmap() hint above 512TB does not search the full address space unless a preceding mmap() used a hint value > 128TB && < 512TB. So fix it to treat any hint above 128TB as a hint to search the full address space, instead of checking the hint against TASK_SIZE, we instead check if the addr_limit is already == TASK_SIZE. This also brings the ABI in-line with what is proposed on x86. ie, that a hint address above 128TB up to and including (2^64)-1 is an indication to search the full address space. Fixes: f4ea6dcb08ea2c (powerpc/mm: Enable mappings above 128TB) Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-19powerpc/mm/radix: Use mm->task_size for boundary checking instead of addr_limitAneesh Kumar K.V1-2/+2
We don't init addr_limit correctly for 32 bit applications. So default to using mm->task_size for boundary condition checking. We use addr_limit to only control free space search. This makes sure that we do the right thing with 32 bit applications. We should consolidate the usage of TASK_SIZE/mm->task_size and mm->context.addr_limit later. This partially reverts commit fbfef9027c2a7ad (powerpc/mm: Switch some TASK_SIZE checks to use mm_context addr_limit). Fixes: fbfef9027c2a ("powerpc/mm: Switch some TASK_SIZE checks to use mm_context addr_limit") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-01powerpc/mm: Enable mappings above 128TBAneesh Kumar K.V1-12/+42
Not all user space application is ready to handle wide addresses. It's known that at least some JIT compilers use higher bits in pointers to encode their information. It collides with valid pointers with 512TB addresses and leads to crashes. To mitigate this, we are not going to allocate virtual address space above 128TB by default. But userspace can ask for allocation from full address space by specifying hint address (with or without MAP_FIXED) above 128TB. If hint address set above 128TB, but MAP_FIXED is not specified, we try to look for unmapped area by specified address. If it's already occupied, we look for unmapped area in *full* address space, rather than from 128TB window. This approach helps to easily make application's memory allocator aware about large address space without manually tracking allocated virtual address space. This is going to be a per mmap decision. ie, we can have some mmaps with larger addresses and other that do not. A sample memory layout looks like: 10000000-10010000 r-xp 00000000 fc:00 9057045 /home/max_addr_512TB 10010000-10020000 r--p 00000000 fc:00 9057045 /home/max_addr_512TB 10020000-10030000 rw-p 00010000 fc:00 9057045 /home/max_addr_512TB 10029630000-10029660000 rw-p 00000000 00:00 0 [heap] 7fff834a0000-7fff834b0000 rw-p 00000000 00:00 0 7fff834b0000-7fff83670000 r-xp 00000000 fc:00 9177190 /lib/powerpc64le-linux-gnu/libc-2.23.so 7fff83670000-7fff83680000 r--p 001b0000 fc:00 9177190 /lib/powerpc64le-linux-gnu/libc-2.23.so 7fff83680000-7fff83690000 rw-p 001c0000 fc:00 9177190 /lib/powerpc64le-linux-gnu/libc-2.23.so 7fff83690000-7fff836a0000 rw-p 00000000 00:00 0 7fff836a0000-7fff836c0000 r-xp 00000000 00:00 0 [vdso] 7fff836c0000-7fff83700000 r-xp 00000000 fc:00 9177193 /lib/powerpc64le-linux-gnu/ld-2.23.so 7fff83700000-7fff83710000 r--p 00030000 fc:00 9177193 /lib/powerpc64le-linux-gnu/ld-2.23.so 7fff83710000-7fff83720000 rw-p 00040000 fc:00 9177193 /lib/powerpc64le-linux-gnu/ld-2.23.so 7fffdccf0000-7fffdcd20000 rw-p 00000000 00:00 0 [stack] 1000000000000-1000000010000 rw-p 00000000 00:00 0 1ffff83710000-1ffff83720000 rw-p 00000000 00:00 0 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-01powerpc/mm: Switch some TASK_SIZE checks to use mm_context addr_limitAneesh Kumar K.V1-3/+3
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-01powerpc/mm: Add addr_limit to mm_context and use it to derive max slice indexAneesh Kumar K.V1-9/+11
In the followup patch, we will increase the slice array size to handle 512TB range, but will limit the max addr to 128TB. Avoid doing unnecessary computation and avoid doing slice mask related operation above address limit. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31powerpc/mm/slice: Update slice mask printing to use bitmap printing.Aneesh Kumar K.V1-23/+7
We now get output like below which is much better. [ 0.935306] good_mask low_slice: 0-15 [ 0.935360] good_mask high_slice: 0-511 Compared to [ 0.953414] good_mask:1111111111111111 - 1111111111111......... I also fixed an error with slice_dbg printing. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31powerpc/mm/slice: Move slice_mask struct definition to slice.cAneesh Kumar K.V1-1/+9
This structure definition need not be in a header since this is used only by slice.c file. So move it to slice.c. This also allow us to use SLICE_NUM_HIGH instead of 64. I also switch the low_slices type to u64 from u16. This doesn't have an impact on size of struct due to padding added with u16 type. This helps in using bitmap printing function for printing slice mask. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31powerpc/mm: Move copy_mm_to_paca to paca.cAneesh Kumar K.V1-1/+1
We also update the function arg to struct mm_struct. Move this so that function finds the definition of struct mm_struct. No functional change in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31powerpc/mm/slice: Update the function prototypeAneesh Kumar K.V1-34/+28
This avoid copying the slice_mask struct as function return value Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31powerpc/mm/slice: Convert slice_mask high slice to a bitmapAneesh Kumar K.V1-37/+75
In followup patch we want to increase the va range which will result in us requiring high_slices to have more than 64 bits. To enable this convert high_slices to bitmap. We keep the number bits same in this patch and later change that to higher value Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Fold in fix to use bitmap_empty()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31powerpc/mm/slice: Fix off-by-1 error when computing slice maskAneesh Kumar K.V1-3/+2
For low slice, max addr should be less than 4G. Without limiting this correctly we will end up with a low slice mask which has 17th bit set. This is not a problem with the current code because our low slice mask is of type u16. But in later patch I am switching low slice mask to u64 type and having the 17bit set result in wrong slice mask which in turn results in mmap failures. Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11powerpc/mm/radix: Add checks in slice code to catch radix usageAneesh Kumar K.V1-0/+16
Radix doesn't need slice support. Catch incorrect usage of slice code when radix is enabled. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01powerpc/mm: Make page table size a variableAneesh Kumar K.V1-2/+2
Radix and hash MMU models support different page table sizes. Make the #defines a variable so that existing code can work with variable sizes. Slice related code is only used by hash, so use hash constants there. We will replicate some of the boundary conditions with resepct to TASK_SIZE using radix values too. Right now we do boundary condition check using hash constants. Swapper pgdir size is initialized in asm code. We select the max pgd size to keep it simple. For now we select hash pgdir. When adding radix we will switch that to radix pgdir which is 64K. BUILD_BUG_ON check which is removed is already done in hugepage_init() using MAYBE_BUILD_BUG_ON(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-19powerpc: Add function to copy mm_context_t to the pacaMichael Neuling1-2/+1
This adds a function to copy the mm->context to the paca. This is only a basic conversion for now but will be used more extensively in the next patch. This also adds #ifdef CONFIG_PPC_BOOK3S around this code since it's not used elsewhere. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-28powerpc: Remove some unused functionsMichael Ellerman1-29/+0
Remove slice_set_psize() which is not used. It was added in 3a8247cc2c85 "powerpc: Only demote individual slices rather than whole process" but was never used. Remove vsx_assist_exception() which is not used. It was added in ce48b2100785 "powerpc: Add VSX context save/restore, ptrace and signal support" but was never used. Remove generic_mach_cpu_die() which is not used. Its last caller was removed in 375f561a4131 "powerpc/powernv: Always go into nap mode when CPU is offline". Remove mpc7448_hpc2_power_off() and mpc7448_hpc2_halt() which are unused. These were introduced in c5d56332fd6c "[POWERPC] Add general support for mpc7448hpc2 (Taiga) platform" but were never used. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> [mpe: Update changelog with details on when/why they are unused] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-22powerpc/mm: Fix build error with hugetlfs disabledAneesh Kumar K.V1-1/+2
arch/powerpc/mm/slice.c:704:5: error: expected identifier or ‘(’ before numeric constant int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr, ^ make[1]: *** [arch/powerpc/mm/slice.o] Error 1 make: *** [arch/powerpc/mm/slice.o] Error 2 This got introduced via 1217d34b531c76362217057ca70a8ce8950574e0 "powerpc: Ensure global functions include their prototype". We started including linux/hugetlb.h with that patch and now we have #define is_hugepage_only_range(mm, addr, len) 0 with hugetlbfs disabled. Fixes: 1217d34b531c ("powerpc: Ensure global functions include their prototype") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08powerpc/cell: Make spu_flush_all_slbs() genericIan Munsie1-7/+3
This moves spu_flush_all_slbs() into a generic call copro_flush_all_slbs(). This will be useful when we add cxl which also needs a similar SLB flush call. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25powerpc: Ensure global functions include their prototypeAnton Blanchard1-0/+2
Fix a number of places where global functions were not including their prototype. This ensures the prototype and the function match. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-01-29powerpc/mm: Fix mmap errno when MAP_FIXED is set and mapping exceeds the ↵jmarchan@redhat.com1-1/+1
allowed address space According to Posix, if MAP_FIXED is specified mmap shall set ENOMEM if the requested mapping exceeds the allowed range for address space of the process. The generic code set it right, but the specific powerpc slice_get_unmapped_area() function currently returns -EINVAL in that case. This patch corrects it. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-11-21powerpc: ppc64 address space capped at 32TB, mmap randomisation disabledAnton Blanchard1-1/+1
Commit fba2369e6ceb (mm: use vm_unmapped_area() on powerpc architecture) has a bug in slice_scan_available() where we compare an unsigned long (high_slices) against a shifted int. As a result, comparisons against the top 32 bits of high_slices (representing the top 32TB) always returns 0 and the top of our mmap region is clamped at 32TB This also breaks mmap randomisation since the randomised address is always up near the top of the address space and it gets clamped down to 32TB. Cc: stable@vger.kernel.org # v3.10+ Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Michel Lespinasse <walken@google.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30mm: use vm_unmapped_area() on powerpc architectureMichel Lespinasse1-45/+78
Update the powerpc slice_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Tested-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30mm: remove free_area_cache use in powerpc architectureMichel Lespinasse1-89/+19
As all other architectures have been converted to use vm_unmapped_area(), we are about to retire the free_area_cache. This change simply removes the use of that cache in slice_get_unmapped_area(), which will most certainly have a performance cost. Next one will convert that function to use the vm_unmapped_area() infrastructure and regain the performance. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-19Fix misspellings of "whether" in comments.Adam Buchbinder1-1/+1
"Whether" is misspelled in various comments across the tree; this fixes them. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-09-17powerpc/mm: Make some of the PGTABLE_RANGE dependency explicitAneesh Kumar K.V1-0/+5
slice array size and slice mask size depend on PGTABLE_RANGE. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-17powerpc/mm: Increase the slice range to 64TBAneesh Kumar K.V1-39/+68
This patch makes the high psizes mask as an unsigned char array so that we can have more than 16TB. Currently we support upto 64TB Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-10-31powerpc: various straight conversions from module.h --> export.hPaul Gortmaker1-1/+1
All these files were including module.h just for the basic EXPORT_SYMBOL infrastructure. We can shift them off to the export.h header which is a way smaller footprint and thus realize some compile time gains. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2009-01-16powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slicesDave Kleikamp1-1/+10
powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices The subpage_prot syscall fails on second and subsequent calls for a given region, because is_hugepage_only_range() is mis-identifying the 4 kB slices when the process has a 64 kB page size. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-16Merge commit 'origin/master'Benjamin Herrenschmidt1-1/+1
Manual merge of: arch/powerpc/Kconfig arch/powerpc/kernel/stacktrace.c arch/powerpc/mm/slice.c arch/ppc/kernel/smp.c
2008-07-01powerpc: Only demote individual slices rather than whole processPaul Mackerras1-44/+133
At present, if we have a kernel with a 64kB page size, and some process maps something that has to be mapped with 4kB pages (such as a cache-inhibited mapping on POWER5+, or the eHCA infiniband queue-pair pages), we change the process to use 4kB pages everywhere. This hurts the performance of HPC programs that access eHCA from userspace. With this patch, the kernel will only demote the slice(s) containing the eHCA or cache-inhibited mappings, leaving the remaining slices able to use 64kB hardware pages. This also changes the slice_get_unmapped_area code so that it is willing to place a 64k-page mapping into (or across) a 4k-page slice if there is no better alternative, i.e. if the program specified MAP_FIXED or if there is not sufficient space available in slices that are either empty or already have 64k-page mappings in them. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-06-26on_each_cpu(): kill unused 'retry' parameterJens Axboe1-1/+1
It's not even passed on to smp_call_function() anymore, since that was removed. So kill it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-17spin_lock_unlocked cleanupsRoel Kluin1-1/+1
Replace some SPIN_LOCK_UNLOCKED with DEFINE_SPINLOCK Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-17[POWERPC] Fix non HUGETLB_PAGE build warningStephen Rothwell1-0/+1
arch/powerpc/mm/mmu_context_64.c: In function 'init_new_context': arch/powerpc/mm/mmu_context_64.c:31: warning: unused variable 'new_context' Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-08-10[POWERPC] Fix size check for hugetlbfsBenjamin Herrenschmidt1-0/+2
My "slices" address space management code that was added in the 2.6.22 implementation of get_unmapped_area() doesn't properly check that the size is a multiple of the requested page size. This allows userland to create VMAs that aren't a multiple of the huge page size with hugetlbfs (since hugetlbfs entirely relies on get_unmapped_area() to do that checking) which leads to a kernel BUG() when such areas are torn down. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09[POWERPC] Introduce address space "slices"Benjamin Herrenschmidt1-0/+633
The basic issue is to be able to do what hugetlbfs does but with different page sizes for some other special filesystems; more specifically, my need is: - Huge pages - SPE local store mappings using 64K pages on a 4K base page size kernel on Cell - Some special 4K segments in 64K-page kernels for mapping a dodgy type of powerpc-specific infiniband hardware that requires 4K MMU mappings for various reasons I won't explain here. The main issues are: - To maintain/keep track of the page size per "segment" (as we can only have one page size per segment on powerpc, which are 256MB divisions of the address space). - To make sure special mappings stay within their allotted "segments" (including MAP_FIXED crap) - To make sure everybody else doesn't mmap/brk/grow_stack into a "segment" that is used for a special mapping Some of the necessary mechanisms to handle that were present in the hugetlbfs code, but mostly in ways not suitable for anything else. The patch relies on some changes to the generic get_unmapped_area() that just got merged. It still hijacks hugetlb callbacks here or there as the generic code hasn't been entirely cleaned up yet but that shouldn't be a problem. So what is a slice ? Well, I re-used the mechanism used formerly by our hugetlbfs implementation which divides the address space in "meta-segments" which I called "slices". The division is done using 256MB slices below 4G, and 1T slices above. Thus the address space is divided currently into 16 "low" slices and 16 "high" slices. (Special case: high slice 0 is the area between 4G and 1T). Doing so simplifies significantly the tracking of segments and avoids having to keep track of all the 256MB segments in the address space. While I used the "concepts" of hugetlbfs, I mostly re-implemented everything in a more generic way and "ported" hugetlbfs to it. Slices can have an associated page size, which is encoded in the mmu context and used by the SLB miss handler to set the segment sizes. The hash code currently doesn't care, it has a specific check for hugepages, though I might add a mechanism to provide per-slice hash mapping functions in the future. The slice code provide a pair of "generic" get_unmapped_area() (bottomup and topdown) functions that should work with any slice size. There is some trickiness here so I would appreciate people to have a look at the implementation of these and let me know if I got something wrong. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>