<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/dax.c, branch v4.10-rc4</title>
<subtitle>Linux Kernel (branches are rebased on master from time to time)</subtitle>
<id>https://sre.ring0.de/linux/atom?h=v4.10-rc4</id>
<link rel='self' href='https://sre.ring0.de/linux/atom?h=v4.10-rc4'/>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/'/>
<updated>2017-01-11T02:31:54Z</updated>
<entry>
<title>dax: wrprotect pmd_t in dax_mapping_entry_mkclean</title>
<updated>2017-01-11T02:31:54Z</updated>
<author>
<name>Ross Zwisler</name>
<email>ross.zwisler@linux.intel.com</email>
</author>
<published>2017-01-11T00:57:24Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=f729c8c9b24f0540a6e6b86e68f3888ba90ef7e7'/>
<id>urn:sha1:f729c8c9b24f0540a6e6b86e68f3888ba90ef7e7</id>
<content type='text'>
Currently dax_mapping_entry_mkclean() fails to clean and write protect
the pmd_t of a DAX PMD entry during an *sync operation.  This can result
in data loss in the following sequence:

1) mmap write to DAX PMD, dirtying PMD radix tree entry and making the
   pmd_t dirty and writeable
2) fsync, flushing out PMD data and cleaning the radix tree entry. We
   currently fail to mark the pmd_t as clean and write protected.
3) more mmap writes to the PMD.  These don't cause any page faults since
   the pmd_t is dirty and writeable.  The radix tree entry remains clean.
4) fsync, which fails to flush the dirty PMD data because the radix tree
   entry was clean.
5) crash - dirty data that should have been fsync'd as part of 4) could
   still have been in the processor cache, and is lost.

Fix this by marking the pmd_t clean and write protected in
dax_mapping_entry_mkclean(), which is called as part of the fsync
operation 2).  This will cause the writes in step 3) above to generate
page faults where we'll re-dirty the PMD radix tree entry, resulting in
flushes in the fsync that happens in step 4).

Fixes: 4b4bb46d00b3 ("dax: clear dirty entry tags on cache flush")
Link: http://lkml.kernel.org/r/1482272586-21177-3-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>dax: Call -&gt;iomap_begin without entry lock during dax fault</title>
<updated>2016-12-27T04:29:25Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-10-19T12:34:31Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=9f141d6ef6258a3a37a045842d9ba7e68f368956'/>
<id>urn:sha1:9f141d6ef6258a3a37a045842d9ba7e68f368956</id>
<content type='text'>
Currently -&gt;iomap_begin() handler is called with entry lock held. If the
filesystem held any locks between -&gt;iomap_begin() and -&gt;iomap_end()
(such as ext4 which will want to hold transaction open), this would cause
lock inversion with the iomap_apply() from standard IO path which first
calls -&gt;iomap_begin() and only then calls -&gt;actor() callback which grabs
entry locks for DAX (if it faults when copying from/to user provided
buffers).

Fix the problem by nesting grabbing of entry lock inside -&gt;iomap_begin()
- -&gt;iomap_end() pair.

Reviewed-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>dax: Finish fault completely when loading holes</title>
<updated>2016-12-27T04:29:25Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-10-19T12:48:38Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=f449b936f1aff7696b24a338f493d5cee8d48d55'/>
<id>urn:sha1:f449b936f1aff7696b24a338f493d5cee8d48d55</id>
<content type='text'>
The only case when we do not finish the page fault completely is when we
are loading hole pages into a radix tree. Avoid this special case and
finish the fault in that case as well inside the DAX fault handler. It
will allow us for easier iomap handling.

Reviewed-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>dax: Avoid page invalidation races and unnecessary radix tree traversals</title>
<updated>2016-12-27T04:29:24Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-08-10T15:10:28Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=e3fce68cdbed297d927e993b3ea7b8b1cee545da'/>
<id>urn:sha1:e3fce68cdbed297d927e993b3ea7b8b1cee545da</id>
<content type='text'>
Currently dax_iomap_rw() takes care of invalidating page tables and
evicting hole pages from the radix tree when write(2) to the file
happens. This invalidation is only necessary when there is some block
allocation resulting from write(2). Furthermore in current place the
invalidation is racy wrt page fault instantiating a hole page just after
we have invalidated it.

So perform the page invalidation inside dax_iomap_actor() where we can
do it only when really necessary and after blocks have been allocated so
nobody will be instantiating new hole pages anymore.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>mm: Invalidate DAX radix tree entries only if appropriate</title>
<updated>2016-12-27T04:29:24Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-08-10T15:22:44Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=c6dcf52c23d2d3fb5235cec42d7dd3f786b87d55'/>
<id>urn:sha1:c6dcf52c23d2d3fb5235cec42d7dd3f786b87d55</id>
<content type='text'>
Currently invalidate_inode_pages2_range() and invalidate_mapping_pages()
just delete all exceptional radix tree entries they find. For DAX this
is not desirable as we track cache dirtiness in these entries and when
they are evicted, we may not flush caches although it is necessary. This
can for example manifest when we write to the same block both via mmap
and via write(2) (to different offsets) and fsync(2) then does not
properly flush CPU caches when modification via write(2) was the last
one.

Create appropriate DAX functions to handle invalidation of DAX entries
for invalidate_inode_pages2_range() and invalidate_mapping_pages() and
wire them up into the corresponding mm functions.

Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>dax: clear dirty entry tags on cache flush</title>
<updated>2016-12-15T00:04:09Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-12-14T23:07:53Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=4b4bb46d00b386e1c972890dc5785a7966eaa9c0'/>
<id>urn:sha1:4b4bb46d00b386e1c972890dc5785a7966eaa9c0</id>
<content type='text'>
Currently we never clear dirty tags in DAX mappings and thus address
ranges to flush accumulate.  Now that we have locking of radix tree
entries, we have all the locking necessary to reliably clear the radix
tree dirty tag when flushing caches for corresponding address range.
Similarly to page_mkclean() we also have to write-protect pages to get a
page fault when the page is next written to so that we can mark the
entry dirty again.

Link: http://lkml.kernel.org/r/1479460644-25076-21-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>dax: protect PTE modification on WP fault by radix tree entry lock</title>
<updated>2016-12-15T00:04:09Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-12-14T23:07:50Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=2f89dc12a25ddf995b9acd7b6543fe892e3473d6'/>
<id>urn:sha1:2f89dc12a25ddf995b9acd7b6543fe892e3473d6</id>
<content type='text'>
Currently PTE gets updated in wp_pfn_shared() after dax_pfn_mkwrite()
has released corresponding radix tree entry lock.  When we want to
writeprotect PTE on cache flush, we need PTE modification to happen
under radix tree entry lock to ensure consistent updates of PTE and
radix tree (standard faults use page lock to ensure this consistency).
So move update of PTE bit into dax_pfn_mkwrite().

Link: http://lkml.kernel.org/r/1479460644-25076-20-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>dax: make cache flushing protected by entry lock</title>
<updated>2016-12-15T00:04:09Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-12-14T23:07:47Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=a6abc2c0e77b16480f4d2c1eb7925e5287ae1526'/>
<id>urn:sha1:a6abc2c0e77b16480f4d2c1eb7925e5287ae1526</id>
<content type='text'>
Currently, flushing of caches for DAX mappings was ignoring entry lock.
So far this was ok (modulo a bug that a difference in entry lock could
cause cache flushing to be mistakenly skipped) but in the following
patches we will write-protect PTEs on cache flushing and clear dirty
tags.  For that we will need more exclusion.  So do cache flushing under
an entry lock.  This allows us to remove one lock-unlock pair of
mapping-&gt;tree_lock as a bonus.

Link: http://lkml.kernel.org/r/1479460644-25076-19-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: move handling of COW faults into DAX code</title>
<updated>2016-12-15T00:04:09Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-12-14T23:07:24Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=b1aa812b21084285e9f6098639be9cd5bf9e05d7'/>
<id>urn:sha1:b1aa812b21084285e9f6098639be9cd5bf9e05d7</id>
<content type='text'>
Move final handling of COW faults from generic code into DAX fault
handler.  That way generic code doesn't have to be aware of
peculiarities of DAX locking so remove that knowledge and make locking
functions private to fs/dax.c.

Link: http://lkml.kernel.org/r/1479460644-25076-11-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reviewed-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: use vmf-&gt;address instead of of vmf-&gt;virtual_address</title>
<updated>2016-12-15T00:04:09Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-12-14T23:07:01Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=1a29d85eb0f19b7d8271923d8917d7b4f5540b3e'/>
<id>urn:sha1:1a29d85eb0f19b7d8271923d8917d7b4f5540b3e</id>
<content type='text'>
Every single user of vmf-&gt;virtual_address typed that entry to unsigned
long before doing anything with it so the type of virtual_address does
not really provide us any additional safety.  Just use masked
vmf-&gt;address which already has the appropriate type.

Link: http://lkml.kernel.org/r/1479460644-25076-3-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
