linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2020-10-07	ida: Free allocated bitmap in error path	Matthew Wilcox (Oracle)	1	-0/+29
	If a bitmap needs to be allocated, and then by the time the thread is scheduled to be run again all the indices which would satisfy the allocation have been allocated then we would leak the allocation. Almost impossible to hit in practice, but a trivial fix. Found by Coverity. Fixes: f32f004cddf8 ("ida: Convert to XArray") Reported-by: coverity-bot <keescook+coverity-bot@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-10-07	radix tree test suite: Fix compilation	Matthew Wilcox (Oracle)	3	-4/+9
	Introducing local_lock broke compilation; fix it all up. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-04-03	Merge tag 'spdx-5.7-rc1' of ↵	Linus Torvalds	1	-0/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull SPDX updates from Greg KH: "Here are three SPDX patches for 5.7-rc1. One fixes up the SPDX tag for a single driver, while the other two go through the tree and add SPDX tags for all of the .gitignore files as needed. Nothing too complex, but you will get a merge conflict with your current tree, that should be trivial to handle (one file modified by two things, one file deleted.) All three of these have been in linux-next for a while, with no reported issues other than the merge conflict" * tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: ASoC: MT6660: make spdxcheck.py happy .gitignore: add SPDX License Identifier .gitignore: remove too obvious comments
2020-03-25	.gitignore: add SPDX License Identifier	Masahiro Yamada	1	-0/+1
	Add SPDX License Identifier to all .gitignore files. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-12	xarray: Fix early termination of xas_for_each_marked	Matthew Wilcox (Oracle)	4	-2/+91
	xas_for_each_marked() is using entry == NULL as a termination condition of the iteration. When xas_for_each_marked() is used protected only by RCU, this can however race with xas_store(xas, NULL) in the following way: TASK1 TASK2 page_cache_delete() find_get_pages_range_tag() xas_for_each_marked() xas_find_marked() off = xas_find_chunk() xas_store(&xas, NULL) xas_init_marks(&xas); ... rcu_assign_pointer(*slot, NULL); entry = xa_entry(off); And thus xas_for_each_marked() terminates prematurely possibly leading to missed entries in the iteration (translating to missing writeback of some pages or a similar problem). If we find a NULL entry that has been marked, skip it (unless we're trying to allocate an entry). Reported-by: Jan Kara <jack@suse.cz> CC: stable@vger.kernel.org Fixes: ef8e5717db01 ("page cache: Convert delete_batch to XArray") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-02-27	radix tree test suite: Support kmem_cache alignment	Matthew Wilcox (Oracle)	2	-15/+23
	The radix tree doesn't use alignment, so the argument was ignored. The maple tree needs its nodes to be aligned, so we need to pay attention to the alignment argument. Also change the types of 'size' and 'align' to unsigned int to match commit f4957d5bd0916. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2019-07-08	Merge branch 'core-rcu-for-linus' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The changes in this cycle are: - RCU flavor consolidation cleanups and optmizations - Documentation updates - Miscellaneous fixes - SRCU updates - RCU-sync flavor consolidation - Torture-test updates - Linux-kernel memory-consistency-model updates, most notably the addition of plain C-language accesses" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits) tools/memory-model: Improve data-race detection tools/memory-model: Change definition of rcu-fence tools/memory-model: Expand definition of barrier tools/memory-model: Do not use "herd" to refer to "herd7" tools/memory-model: Fix comment in MP+poonceonces.litmus Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() rcu: Don't return a value from rcu_assign_pointer() rcu: Force inlining of rcu_read_lock() rcu: Fix irritating whitespace error in rcu_assign_pointer() rcu: Upgrade sync_exp_work_done() to smp_mb() rcutorture: Upper case solves the case of the vanishing NULL pointer torture: Suppress propagating trace_printk() warning rcutorture: Dump trace buffer for callback pipe drain failures torture: Add --trust-make to suppress "make clean" torture: Make --cpus override idleness calculations torture: Run kernel build in source directory torture: Add function graph-tracing cheat sheet torture: Capture qemu output rcutorture: Tweak kvm options rcutorture: Add trivial RCU implementation ...
2019-06-29	Merge tag 'xarray-5.2-rc6' of git://git.infradead.org/users/willy/linux-dax	Linus Torvalds	1	-0/+46
	Pull XArray fixes from Matthew Wilcox: - Account XArray nodes for the page cache to the appropriate cgroup (Johannes Weiner) - Fix idr_get_next() when called under the RCU lock (Matthew Wilcox) - Add a test for xa_insert() (Matthew Wilcox) * tag 'xarray-5.2-rc6' of git://git.infradead.org/users/willy/linux-dax: XArray tests: Add check_insert idr: Fix idr_get_next race with idr_remove mm: fix page cache convergence regression
2019-06-28	Merge branch 'for-mingo' of ↵	Ingo Molnar	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull rcu/next + tools/memory-model changes from Paul E. McKenney: - RCU flavor consolidation cleanups and optmizations - Documentation updates - Miscellaneous fixes - SRCU updates - RCU-sync flavor consolidation - Torture-test updates - Linux-kernel memory-consistency-model updates, most notably the addition of plain C-language accesses Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-13	rcu: Don't return a value from rcu_assign_pointer()	Andrea Parri	1	-1/+1
	Quoting Paul [1]: "Given that a quick (and perhaps error-prone) search of the uses of rcu_assign_pointer() in v5.1 didn't find a single use of the return value, let's please instead change the documentation and implementation to eliminate the return value." [1] https://lkml.kernel.org/r/20190523135013.GL28207@linux.ibm.com Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Cc: "Paul E. McKenney" <paulmck@linux.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: rcu@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-06-05	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288	Thomas Gleixner	4	-36/+4
	Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms and conditions of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 263 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-02	idr: Fix idr_get_next race with idr_remove	Matthew Wilcox (Oracle)	1	-0/+46
	If the entry is deleted from the IDR between the call to radix_tree_iter_find() and rcu_dereference_raw(), idr_get_next() will return NULL, which will end the iteration prematurely. We should instead continue to the next entry in the IDR. This only happens if the iteration is protected by the RCU lock. Most IDR users use a spinlock or semaphore to exclude simultaneous modifications. It was noticed once the PID allocator was converted to use the IDR, as it uses the RCU lock, but there may be other users elsewhere in the kernel. We can't use the normal pattern of calling radix_tree_deref_retry() (which catches both a retry entry in a leaf node and a node entry in the root) as the IDR supports storing entries which are unaligned, which will trigger an infinite loop if they are encountered. Instead, we have to explicitly check whether the entry is a retry entry. Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree") Reported-by: Brendan Gregg <bgregg@netflix.com> Tested-by: Brendan Gregg <bgregg@netflix.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2018-12-06	radix tree: Don't return retry entries from lookup	Matthew Wilcox	4	-0/+82
	Commit 66ee620f06f9 ("idr: Permit any valid kernel pointer to be stored") changed the radix tree lookup so that it stops when reaching the bottom of the tree. However, the condition was added in the wrong place, making it possible to return retry entries to the caller. Reorder the tests to check for the retry entry before checking whether we're at the bottom of the tree. The retry entry should never be found in the tree root, so it's safe to defer the check until the end of the loop. Add a regression test to the test-suite to be sure this doesn't come back. Fixes: 66ee620f06f9 ("idr: Permit any valid kernel pointer to be stored") Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	radix tree: Remove multiorder support	Matthew Wilcox	1	-1/+0
	All users have now been converted to the XArray. Removing the support reduces code size and ensures new users will use the XArray instead. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	radix tree test: Convert multiorder tests to XArray	Matthew Wilcox	1	-56/+49
	This is the last remaining user of the multiorder functionality of the radix tree. Test the XArray instead. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	radix tree tests: Convert item_delete_rcu to XArray	Matthew Wilcox	2	-3/+3
	In preparation for the removal of the multiorder radix tree code, convert item_delete_rcu() to use the XArray so it can still be called for XArrays containing multi-index entries. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	radix tree tests: Convert item_kill_tree to XArray	Matthew Wilcox	1	-21/+9
	In preparation for the removal of the multiorder radix tree code, convert item_kill_tree() to use the XArray so it can still be called for XArrays containing multi-index entries. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	radix tree tests: Move item_insert_order	Matthew Wilcox	3	-11/+22
	The remaining tests are not suitable for moving in-kernel, so move item_insert_order() into multiorder.c, make it static and make it use the XArray. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	radix tree test suite: Remove multiorder benchmarking	Matthew Wilcox	1	-29/+21
	The multiorder radix tree code is being removed, so remove the benchmarking of its performance. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	radix tree test suite: Remove __item_insert	Matthew Wilcox	2	-7/+1
	Inline it into its one caller Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	xarray: Move multiorder_check to in-kernel tests	Matthew Wilcox	1	-48/+0
	This version is a little less thorough in order to be a little quicker, but tests the important edge cases. Also test adding a multiorder entry at a non-canonical index, and erasing it. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	xarray: Move multiorder_shrink to kernel tests	Matthew Wilcox	1	-173/+0
	Test this functionality inside the kernel as well as in userspace. Also remove insert_bug() as there's no comparable thing to test in the XArray code. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	xarray: Move multiorder account test in-kernel	Matthew Wilcox	1	-24/+0
	Move this test to the in-kernel test suite, and enhance it to test several different orders. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	radix tree test suite: Convert iteration test to XArray	Matthew Wilcox	3	-58/+63
	With no code left in the kernel using the multiorder radix tree, convert the iteration test from the radix tree to the XArray. It's unlikely to suffer the same bug as the radix tree, but this test will prevent that bug from ever creeping into the XArray implementation. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	radix tree test suite: Convert tag_tagged_items to XArray	Matthew Wilcox	8	-57/+46
	The tag_tagged_items() function is supposed to test the page-writeback tagging code. Since that has been converted to the XArray, there's not much point in testing the radix tree's tagging code. This requires using the pthread mutex embedded in the xarray instead of an external lock, so remove the pthread mutexes which protect xarrays/radix trees. Also remove radix_tree_iter_tag_set() as this was the last user. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	radix tree: Remove radix_tree_clear_tags	Matthew Wilcox	1	-29/+0
	The page cache was the only user of this interface and it has now been converted to the XArray. Transform the test into a test of xas_init_marks(). Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	radix tree: Remove split/join code	Matthew Wilcox	2	-338/+0
	radix_tree_split and radix_tree_join were never used upstream. Remove them; if they're needed in future they will be replaced by XArray equivalents. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	radix tree: Remove radix_tree_update_node_t	Matthew Wilcox	1	-1/+1
	The only user of this functionality was the workingset code, and it's now been converted to the XArray. Remove __radix_tree_delete_node() entirely as it was also only used by the workingset code. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	shmem: Convert find_swap_entry to XArray	Matthew Wilcox	3	-84/+0
	This is a 1:1 conversion. The major part of this patch is converting the test framework from userspace to kernel space and mirroring the algorithm now used in find_swap_entry(). Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	radix tree test suite: Convert regression1 to XArray	Matthew Wilcox	1	-39/+19
	Now the page cache lookup is using the XArray, let's convert this regression test from the radix tree API to the XArray so it's testing roughly the same thing it was testing before. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	page cache: Convert find_get_pages_contig to XArray	Matthew Wilcox	1	-23/+0
	There's no direct replacement for radix_tree_for_each_contig() in the XArray API as it's an unusual thing to do. Instead, open-code a loop using xas_next(). This removes the only user of radix_tree_for_each_contig() so delete the iterator from the API and the test suite code for it. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	ida: Convert to XArray	Matthew Wilcox	1	-5/+3
	Use the XA_TRACK_FREE ability to track which entries have a free bit, similarly to how it uses the radix tree's IDR_FREE tag. This eliminates the per-cpu ida_bitmap preload, and fixes the memory consumption regression I introduced when making the IDR able to store any pointer. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	xarray: Add XArray unconditional store operations	Matthew Wilcox	4	-1/+39
	xa_store() differs from radix_tree_insert() in that it will overwrite an existing element in the array rather than returning an error. This is the behaviour which most users want, and those that want more complex behaviour generally want to use the xas family of routines anyway. For memory allocation, xa_store() will first attempt to request memory from the slab allocator; if memory is not immediately available, it will drop the xa_lock and allocate memory, keeping a pointer in the xa_state. It does not use the per-CPU cache, although those will continue to exist until all radix tree users are converted to the xarray. This patch also includes xa_erase() and __xa_erase() for a streamlined way to store NULL. Since there is no need to allocate memory in order to store a NULL in the XArray, we do not need to trouble the user with deciding what memory allocation flags to use. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	xarray: Add XArray load operation	Matthew Wilcox	7	-2/+38
	The xa_load function brings with it a lot of infrastructure; xa_empty(), xa_is_err(), and large chunks of the XArray advanced API that are used to implement xa_load. As the test-suite demonstrates, it is possible to use the XArray functions on a radix tree. The radix tree functions depend on the GFP flags being stored in the root of the tree, so it's not possible to use the radix tree functions on an XArray. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21	xarray: Define struct xa_node	Matthew Wilcox	1	-15/+15
	This is a direct replacement for struct radix_tree_node. A couple of struct members have changed name, so convert those. Use a #define so that radix tree users continue to work without change. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Josef Bacik <jbacik@fb.com>
2018-10-21	xarray: Add definition of struct xarray	Matthew Wilcox	6	-7/+19
	This is a direct replacement for struct radix_tree_root. Some of the struct members have changed name; convert those, and use a #define so that radix_tree users continue to work without change. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Josef Bacik <jbacik@fb.com>
2018-09-29	xarray: Change definition of sibling entries	Matthew Wilcox	2	-1/+2
	Instead of storing a pointer to the slot containing the canonical entry, store the offset of the slot. Produces slightly more efficient code (~300 bytes) and simplifies the implementation. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Josef Bacik <jbacik@fb.com>
2018-09-29	xarray: Replace exceptional entries	Matthew Wilcox	4	-29/+27
	Introduce xarray value entries and tagged pointers to replace radix tree exceptional entries. This is a slight change in encoding to allow the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a value entry). It is also a change in emphasis; exceptional entries are intimidating and different. As the comment explains, you can choose to store values or pointers in the xarray and they are both first-class citizens. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Josef Bacik <jbacik@fb.com>
2018-09-29	idr: Permit any valid kernel pointer to be stored	Matthew Wilcox	1	-0/+63
	An upcoming change to the encoding of internal entries will set the bottom two bits to 0b10. Unfortunately, m68k only aligns some data structures to 2 bytes, so the IDR will interpret them as internal entries and things will go badly wrong. Change the radix tree so that it stops either when the node indicates that it's the bottom of the tree (shift == 0) or when the entry is not an internal entry. This means we cannot insert an arbitrary kernel pointer as a multiorder entry, but the IDR does not permit multiorder entries. Annoyingly, this means the IDR can no longer take advantage of the radix tree's ability to store a single entry at offset 0 without allocating memory. A pointer which is 2-byte aligned cannot be stored directly in the root as it would be indistinguishable from a node, so we must allocate a node in order to store a 2-byte pointer at index 0. The idr_replace() function does not take a GFP flags argument, so cannot allocate memory. If a user inserts a 4-byte aligned pointer at index 0 and then replaces it with a 2-byte aligned pointer, we must be able to store it. Arbitrary pointer values are still not permitted; pointers of the form 2 + (i * 4) for values of i between 0 and 1023 are reserved for the implementation. These are not valid kernel pointers as they would point into the zero page. This change does cause a runtime memory consumption regression for the IDA. I will recover that later. Signed-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Guenter Roeck <linux@roeck-us.net>
2018-08-21	test_ida: check_ida_destroy and check_ida_alloc	Matthew Wilcox	1	-66/+4
	Move these tests from the userspace test-suite to the kernel test-suite. Also convert check_ida_random to the new API. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-08-21	test_ida: Convert check_ida_conv to new API	Matthew Wilcox	1	-46/+10
	Move as much as possible to kernel space; leave the parts in user space that rely on checking memory allocation failures to detect the transition between an exceptional entry and a bitmap. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-08-21	test_ida: Move ida_check_max	Matthew Wilcox	1	-28/+0
	Convert to new API and move to kernel space. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-08-21	test_ida: Move ida_check_leaf	Matthew Wilcox	1	-27/+0
	Convert to new API and move to kernel space. Take the opportunity to test the situation a little more thoroughly (ie at different offsets). Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-08-21	idr-test: Convert ida_check_nomem to new API	Matthew Wilcox	1	-6/+7
	We can't move this test to kernel space because there's no way to force kmalloc to fail. But we can use the new API and check this works when the test is in userspace. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-08-21	ida: Start new test_ida module	Matthew Wilcox	4	-7/+22
	Start transitioning the IDA tests into kernel space. Framework heavily cribbed from test_xarray.c. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-08-21	radix tree test suite: Enable ubsan	Matthew Wilcox	2	-11/+14
	Add support for the undefined behaviour sanitizer and fix the bugs that ubsan pointed out. Nothing major, and all in the test suite, not the code. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-08-21	radix tree test suite: Fix compilation	Matthew Wilcox	1	-0/+2
	An include of xarray.h was added to lib/idr.c without updating the test suite. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-05-25	idr: fix invalid ptr dereference on item delete	Matthew Wilcox	1	-0/+7
	If the radix tree underlying the IDR happens to be full and we attempt to remove an id which is larger than any id in the IDR, we will call __radix_tree_delete() with an uninitialised 'slot' pointer, at which point anything could happen. This was easiest to hit with a single entry at id 0 and attempting to remove a non-0 id, but it could have happened with 64 entries and attempting to remove an id >= 64. Roman said: The syzcaller test boils down to opening /dev/kvm, creating an eventfd, and calling a couple of KVM ioctls. None of this requires superuser. And the result is dereferencing an uninitialized pointer which is likely a crash. The specific path caught by syzbot is via KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are other user-triggerable paths, so cc:stable is probably justified. Matthew added: We have around 250 calls to idr_remove() in the kernel today. Many of them pass an ID which is embedded in the object they're removing, so they're safe. Picking a few likely candidates: drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl. drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar drivers/atm/nicstar.c could be taken down by a handcrafted packet Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree") Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com> Debugged-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18	radix tree test suite: multi-order iteration race	Ross Zwisler	2	-0/+64
	Add a test which shows a race in the multi-order iteration code. This test reliably hits the race in under a second on my machine, and is the result of a real bug report against kernel a production v4.15 based kernel (4.15.6-300.fc27.x86_64). With a real kernel this issue is hit when using order 9 PMD DAX radix tree entries. The race has to do with how we tear down multi-order sibling entries when we are removing an item from the tree. Remember that an order 2 entry looks like this: struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] where 'entry' is in some slot in the struct radix_tree_node, and the three slots following 'entry' contain sibling pointers which point back to 'entry.' When we delete 'entry' from the tree, we call : radix_tree_delete() radix_tree_delete_item() __radix_tree_delete() replace_slot() replace_slot() first removes the siblings in order from the first to the last, then at then replaces 'entry' with NULL. This means that for a brief period of time we end up with one or more of the siblings removed, so: struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] This causes an issue if you have a reader iterating over the slots in the tree via radix_tree_for_each_slot() while only under rcu_read_lock()/rcu_read_unlock() protection. This is a common case in mm/filemap.c. The issue is that when __radix_tree_next_slot() => skip_siblings() tries to skip over the sibling entries in the slots, it currently does so with an exact match on the slot directly preceding our current slot. Normally this works: V preceding slot struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] ^ current slot This lets you find the first sibling, and you skip them all in order. But in the case where one of the siblings is NULL, that slot is skipped and then our sibling detection is interrupted: V preceding slot struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] ^ current slot This means that the sibling pointers aren't recognized since they point all the way back to 'entry', so we think that they are normal internal radix tree pointers. This causes us to think we need to walk down to a struct radix_tree_node starting at the address of 'entry'. In a real running kernel this will crash the thread with a GP fault when you try and dereference the slots in your broken node starting at 'entry'. In the radix tree test suite this will be caught by the address sanitizer: ==27063==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60c0008ae400 at pc 0x00000040ce4f bp 0x7fa89b8fcad0 sp 0x7fa89b8fcac0 READ of size 8 at 0x60c0008ae400 thread T3 #0 0x40ce4e in __radix_tree_next_slot /home/rzwisler/project/linux/tools/testing/radix-tree/radix-tree.c:1660 #1 0x4022cc in radix_tree_next_slot linux/../../../../include/linux/radix-tree.h:567 #2 0x4022cc in iterator_func /home/rzwisler/project/linux/tools/testing/radix-tree/multiorder.c:655 #3 0x7fa8a088d50a in start_thread (/lib64/libpthread.so.0+0x750a) #4 0x7fa8a03bd16e in clone (/lib64/libc.so.6+0xf516e) Link: http://lkml.kernel.org/r/20180503192430.7582-5-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: CR, Sapthagirish <sapthagirish.cr@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18	radix tree test suite: add item_delete_rcu()	Ross Zwisler	2	-0/+21
	Currently the lifetime of "struct item" entries in the radix tree are not controlled by RCU, but are instead deleted inline as they are removed from the tree. In the following patches we add a test which has threads iterating over items pulled from the tree and verifying them in an rcu_read_lock()/rcu_read_unlock() section. This means that though an item has been removed from the tree it could still be being worked on by other threads until the RCU grace period expires. So, we need to actually free the "struct item" structures at the end of the grace period, just as we do with "struct radix_tree_node" items. Link: http://lkml.kernel.org/r/20180503192430.7582-4-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: CR, Sapthagirish <sapthagirish.cr@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>