summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2017-06-21xfs: don't allow bmap on rt filesDarrick J. Wong1-2/+5
bmap returns a dumb LBA address but not the block device that goes with that LBA. Swapfiles don't care about this and will blindly assume that the data volume is the correct blockdev, which is totally bogus for files on the rt subvolume. This results in the swap code doing IOs to arbitrary locations on the data device(!) if the passed in mapping is a realtime file, so just turn off bmap for rt files. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-06-08xfs: fix spurious spin_is_locked() assert failures on non-smp kernelsBrian Foster2-4/+3
The 0-day kernel test robot reports assertion failures on !CONFIG_SMP kernels due to failed spin_is_locked() checks. As it turns out, spin_is_locked() is hardcoded to return zero on !CONFIG_SMP kernels and so this function cannot be relied on to verify spinlock state in this configuration. To avoid this problem, replace the associated asserts with lockdep variants that do the right thing regardless of kernel configuration. Drop the one assert that checks for an unlocked lock as there is no suitable lockdep variant for that case. This moves the spinlock checks from XFS debug code to lockdep, but generally provides the same level of protection. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-31xfs: use ->b_state to fix buffer I/O accounting release raceBrian Foster2-15/+28
We've had user reports of unmount hangs in xfs_wait_buftarg() that analysis shows is due to btp->bt_io_count == -1. bt_io_count represents the count of in-flight asynchronous buffers and thus should always be >= 0. xfs_wait_buftarg() waits for this value to stabilize to zero in order to ensure that all untracked (with respect to the lru) buffers have completed I/O processing before unmount proceeds to tear down in-core data structures. The value of -1 implies an I/O accounting decrement race. Indeed, the fact that xfs_buf_ioacct_dec() is called from xfs_buf_rele() (where the buffer lock is no longer held) means that bp->b_flags can be updated from an unsafe context. While a user-level reproducer is currently not available, some intrusive hacks to run racing buffer lookups/ioacct/releases from multiple threads was used to successfully manufacture this problem. Existing callers do not expect to acquire the buffer lock from xfs_buf_rele(). Therefore, we can not safely update ->b_flags from this context. It turns out that we already have separate buffer state bits and associated serialization for dealing with buffer LRU state in the form of ->b_state and ->b_lock. Therefore, replace the _XBF_IN_FLIGHT flag with a ->b_state variant, update the I/O accounting wrappers appropriately and make sure they are used with the correct locking. This ensures that buffer in-flight state can be modified at buffer release time without racing with modifications from a buffer lock holder. Fixes: 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against unmount") Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Libor Pechacek <lpechacek@suse.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-25xfs: Move handling of missing page into one place in ↵Jan Kara1-30/+8
xfs_find_get_desired_pgoff() Currently several places in xfs_find_get_desired_pgoff() handle the case of a missing page. Make them all handled in one place after the loop has terminated. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-25xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff()Jan Kara1-1/+1
There is an off-by-one error in loop termination conditions in xfs_find_get_desired_pgoff() since 'end' may index a page beyond end of desired range if 'endoff' is page aligned. It doesn't have any visible effects but still it is good to fix it. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-25xfs: Fix missed holes in SEEK_HOLE implementationJan Kara1-20/+9
XFS SEEK_HOLE implementation could miss a hole in an unwritten extent as can be seen by the following command: xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k" -c "seek -h 0" file wrote 57344/57344 bytes at offset 0 56 KiB, 14 ops; 0.0000 sec (49.312 MiB/sec and 12623.9856 ops/sec) wrote 8192/8192 bytes at offset 131072 8 KiB, 2 ops; 0.0000 sec (70.383 MiB/sec and 18018.0180 ops/sec) Whence Result HOLE 139264 Where we can see that hole at offset 56k was just ignored by SEEK_HOLE implementation. The bug is in xfs_find_get_desired_pgoff() which does not properly detect the case when pages are not contiguous. Fix the problem by properly detecting when found page has larger offset than expected. CC: stable@vger.kernel.org Fixes: d126d43f631f996daeee5006714fed914be32368 Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-25xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()Eryu Guan1-1/+1
xfs_find_get_desired_pgoff() is used to search for offset of hole or data in page range [index, end] (both inclusive), and the max number of pages to search should be at least one, if end == index. Otherwise the only page is missed and no hole or data is found, which is not correct. When block size is smaller than page size, this can be demonstrated by preallocating a file with size smaller than page size and writing data to the last block. E.g. run this xfs_io command on a 1k block size XFS on x86_64 host. # xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \ -c "seek -d 0" /mnt/xfs/testfile wrote 1024/1024 bytes at offset 2048 1 KiB, 1 ops; 0.0000 sec (33.675 MiB/sec and 34482.7586 ops/sec) Whence Result DATA EOF Data at offset 2k was missed, and lseek(2) returned ENXIO. This is uncovered by generic/285 subtest 07 and 08 on ppc64 host, where pagesize is 64k. Because a recent change to generic/285 reduced the preallocated file size to smaller than 64k. Cc: stable@vger.kernel.org # v3.7+ Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-25xfs: fix unaligned access in xfs_btree_visit_blocksEric Sandeen1-1/+1
This structure copy was throwing unaligned access warnings on sparc64: Kernel unaligned access at TPC[1043c088] xfs_btree_visit_blocks+0x88/0xe0 [xfs] xfs_btree_copy_ptrs does a memcpy, which avoids it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-19xfs: avoid mount-time deadlock in CoW extent recoveryDarrick J. Wong1-12/+31
If a malicious user corrupts the refcount btree to cause a cycle between different levels of the tree, the next mount attempt will deadlock in the CoW recovery routine while grabbing buffer locks. We can use the ability to re-grab a buffer that was previous locked to a transaction to avoid deadlocks, so do that here. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-05-16xfs: only return detailed fsmap info if the caller has CAP_SYS_ADMINDarrick J. Wong1-1/+4
There were a number of handwaving complaints that one could "possibly" use inode numbers and extent maps to fingerprint a filesystem hosting multiple containers and somehow use the information to guess at the contents of other containers and attack them. Despite the total lack of any demonstration that this is actually possible, it's easier to restrict access now and broaden it later, so use the rmapbt fsmap backends only if the caller has CAP_SYS_ADMIN. Unprivileged users will just have to make do with only getting the free space and static metadata placement information. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2017-05-16xfs: bad assertion for delalloc an extent that start at i_sizeZorro Lang1-1/+1
By run fsstress long enough time enough in RHEL-7, I find an assertion failure (harder to reproduce on linux-4.11, but problem is still there): XFS: Assertion failed: (iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c The assertion is in xfs_getbmap() funciton: if (map[i].br_startblock == DELAYSTARTBLOCK && --> map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip))) ASSERT((iflags & BMV_IF_DELALLOC) != 0); When map[i].br_startoff == XFS_B_TO_FSB(mp, XFS_ISIZE(ip)), the startoff is just at EOF. But we only need to make sure delalloc extents that are within EOF, not include EOF. Signed-off-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-16xfs: fix warnings about unused stack variablesDarrick J. Wong1-2/+0
Reduce stack usage and get rid of compiler warnings by eliminating unused variables. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2017-05-16xfs: BMAPX shouldn't barf on inline-format directoriesDarrick J. Wong1-2/+6
When we're fulfilling a BMAPX request, jump out early if the data fork is in local format. This prevents us from hitting a debugging check in bmapi_read and barfing errors back to userspace. The on-disk extent count check later isn't sufficient for IF_DELALLOC mode because da extents are in memory and not on disk. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-05-16xfs: fix indlen accounting error on partial delalloc conversionBrian Foster1-3/+4
The delalloc -> real block conversion path uses an incorrect calculation in the case where the middle part of a delalloc extent is being converted. This is documented as a rare situation because XFS generally attempts to maximize contiguity by converting as much of a delalloc extent as possible. If this situation does occur, the indlen reservation for the two new delalloc extents left behind by the conversion of the middle range is calculated and compared with the original reservation. If more blocks are required, the delta is allocated from the global block pool. This delta value can be characterized as the difference between the new total requirement (temp + temp2) and the currently available reservation minus those blocks that have already been allocated (startblockval(PREV.br_startblock) - allocated). The problem is that the current code does not account for previously allocated blocks correctly. It subtracts the current allocation count from the (new - old) delta rather than the old indlen reservation. This means that more indlen blocks than have been allocated end up stashed in the remaining extents and free space accounting is broken as a result. Fix up the calculation to subtract the allocated block count from the original extent indlen and thus correctly allocate the reservation delta based on the difference between the new total requirement and the unused blocks from the original reservation. Also remove a bogus assert that contradicts the fact that the new indlen reservation can be larger than the original indlen reservation. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-13Merge tag 'upstream-4.12-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds6-8/+36
Pull UBI/UBIFS updates from Richard Weinberger: - new config option CONFIG_UBIFS_FS_SECURITY - minor improvements - random fixes * tag 'upstream-4.12-rc1' of git://git.infradead.org/linux-ubifs: ubi: Add debugfs file for tracking PEB state ubifs: Fix a typo in comment of ioctl2ubifs & ubifs2ioctl ubifs: Remove unnecessary assignment ubifs: Fix cut and paste error on sb type comparisons ubi: fastmap: Fix slab corruption ubifs: Add CONFIG_UBIFS_FS_SECURITY to disable/enable security labels ubi: Make mtd parameter readable ubi: Fix section mismatch
2017-05-13Merge branch 'akpm' (patches from Andrew)Linus Torvalds3-66/+50
Merge misc fixes from Andrew Morton: "15 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm, docs: update memory.stat description with workingset* entries mm: vmscan: scan until it finds eligible pages mm, thp: copying user pages must schedule on collapse dax: fix PMD data corruption when fault races with write dax: fix data corruption when fault races with write ext4: return to starting transaction in ext4_dax_huge_fault() mm: fix data corruption due to stale mmap reads dax: prevent invalidation of mapped DAX entries Tigran has moved mm, vmalloc: fix vmalloc users tracking properly mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin gcov: support GCC 7.1 mm, vmstat: Remove spurious WARN() during zoneinfo print time: delete current_fs_time() hwpoison, memcg: forcibly uncharge LRU pages
2017-05-12dax: fix PMD data corruption when fault races with writeRoss Zwisler1-14/+14
This is based on a patch from Jan Kara that fixed the equivalent race in the DAX PTE fault path. Currently DAX PMD read fault can race with write(2) in the following way: CPU1 - write(2) CPU2 - read fault dax_iomap_pmd_fault() ->iomap_begin() - sees hole dax_iomap_rw() iomap_apply() ->iomap_begin - allocates blocks dax_iomap_actor() invalidate_inode_pages2_range() - there's nothing to invalidate grab_mapping_entry() - we add huge zero page to the radix tree and map it to page tables The result is that hole page is mapped into page tables (and thus zeros are seen in mmap) while file has data written in that place. Fix the problem by locking exception entry before mapping blocks for the fault. That way we are sure invalidate_inode_pages2_range() call for racing write will either block on entry lock waiting for the fault to finish (and unmap stale page tables after that) or read fault will see already allocated blocks by write(2). Fixes: 9f141d6ef6258 ("dax: Call ->iomap_begin without entry lock during dax fault") Link: http://lkml.kernel.org/r/20170510172700.18991-1-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12dax: fix data corruption when fault races with writeJan Kara1-16/+16
Currently DAX read fault can race with write(2) in the following way: CPU1 - write(2) CPU2 - read fault dax_iomap_pte_fault() ->iomap_begin() - sees hole dax_iomap_rw() iomap_apply() ->iomap_begin - allocates blocks dax_iomap_actor() invalidate_inode_pages2_range() - there's nothing to invalidate grab_mapping_entry() - we add zero page in the radix tree and map it to page tables The result is that hole page is mapped into page tables (and thus zeros are seen in mmap) while file has data written in that place. Fix the problem by locking exception entry before mapping blocks for the fault. That way we are sure invalidate_inode_pages2_range() call for racing write will either block on entry lock waiting for the fault to finish (and unmap stale page tables after that) or read fault will see already allocated blocks by write(2). Fixes: 9f141d6ef6258a3a37a045842d9ba7e68f368956 Link: http://lkml.kernel.org/r/20170510085419.27601-5-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12ext4: return to starting transaction in ext4_dax_huge_fault()Jan Kara1-4/+17
DAX will return to locking exceptional entry before mapping blocks for a page fault to fix possible races with concurrent writes. To avoid lock inversion between exceptional entry lock and transaction start, start the transaction already in ext4_dax_huge_fault(). Fixes: 9f141d6ef6258a3a37a045842d9ba7e68f368956 Link: http://lkml.kernel.org/r/20170510085419.27601-4-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12mm: fix data corruption due to stale mmap readsJan Kara1-1/+1
Currently, we didn't invalidate page tables during invalidate_inode_pages2() for DAX. That could result in e.g. 2MiB zero page being mapped into page tables while there were already underlying blocks allocated and thus data seen through mmap were different from data seen by read(2). The following sequence reproduces the problem: - open an mmap over a 2MiB hole - read from a 2MiB hole, faulting in a 2MiB zero page - write to the hole with write(3p). The write succeeds but we incorrectly leave the 2MiB zero page mapping intact. - via the mmap, read the data that was just written. Since the zero page mapping is still intact we read back zeroes instead of the new data. Fix the problem by unconditionally calling invalidate_inode_pages2_range() in dax_iomap_actor() for new block allocations and by properly invalidating page tables in invalidate_inode_pages2_range() for DAX mappings. Fixes: c6dcf52c23d2d3fb5235cec42d7dd3f786b87d55 Link: http://lkml.kernel.org/r/20170510085419.27601-3-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12dax: prevent invalidation of mapped DAX entriesRoss Zwisler1-29/+0
Patch series "mm,dax: Fix data corruption due to mmap inconsistency", v4. This series fixes data corruption that can happen for DAX mounts when page faults race with write(2) and as a result page tables get out of sync with block mappings in the filesystem and thus data seen through mmap is different from data seen through read(2). The series passes testing with t_mmap_stale test program from Ross and also other mmap related tests on DAX filesystem. This patch (of 4): dax_invalidate_mapping_entry() currently removes DAX exceptional entries only if they are clean and unlocked. This is done via: invalidate_mapping_pages() invalidate_exceptional_entry() dax_invalidate_mapping_entry() However, for page cache pages removed in invalidate_mapping_pages() there is an additional criteria which is that the page must not be mapped. This is noted in the comments above invalidate_mapping_pages() and is checked in invalidate_inode_page(). For DAX entries this means that we can can end up in a situation where a DAX exceptional entry, either a huge zero page or a regular DAX entry, could end up mapped but without an associated radix tree entry. This is inconsistent with the rest of the DAX code and with what happens in the page cache case. We aren't able to unmap the DAX exceptional entry because according to its comments invalidate_mapping_pages() isn't allowed to block, and unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem. Since we essentially never have unmapped DAX entries to evict from the radix tree, just remove dax_invalidate_mapping_entry(). Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate") Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.cz Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> [4.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12Tigran has movedAndrew Morton1-2/+2
Cc: Tigran Aivazian <aivazian.tigran@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds6-68/+6
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "Incremental fixes and a small feature addition on top of the main libnvdimm 4.12 pull request: - Geert noticed that tinyconfig was bloated by BLOCK selecting DAX. The size regression is fixed by moving all dax helpers into the dax-core and only specifying "select DAX" for FS_DAX and dax-capable drivers. He also asked for clarification of the NR_DEV_DAX config option which, on closer look, does not need to be a config option at all. Mike also throws in a DEV_DAX_PMEM fixup for good measure. - Ben's attention to detail on -stable patch submissions caught a case where the recent fixes to arch_copy_from_iter_pmem() missed a condition where we strand dirty data in the cache. This is tagged for -stable and will also be included in the rework of the pmem api to a proposed {memcpy,copy_user}_flushcache() interface for 4.13. - Vishal adds a feature that missed the initial pull due to pending review feedback. It allows the kernel to clear media errors when initializing a BTT (atomic sector update driver) instance on a pmem namespace. - Ross noticed that the dax_device + dax_operations conversion broke __dax_zero_page_range(). The nvdimm unit tests fail to check this path, but xfstests immediately trips over it. No excuse for missing this before submitting the 4.12 pull request. These all pass the nvdimm unit tests and an xfstests spot check. The set has received a build success notification from the kbuild robot" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: filesystem-dax: fix broken __dax_zero_page_range() conversion libnvdimm, btt: ensure that initializing metadata clears poison libnvdimm: add an atomic vs process context flag to rw_bytes x86, pmem: Fix cache flushing for iovec write < 8 bytes device-dax: kill NR_DEV_DAX block, dax: move "select DAX" from BLOCK to FS_DAX device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX
2017-05-12Merge branch 'work.sane_pwd' of ↵Linus Torvalds3-21/+54
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Making sure that something like a referral point won't end up as pwd or root. The main part is the last commit (fixing mntns_install()); that one fixes a hard-to-hit race. The fchdir() commit is making fchdir(2) a bit more robust - it should be impossible to get opened files (even O_PATH ones) for referral points in the first place, so the existing checks are OK, but checking the same thing as in chdir(2) is just as cheap. The path_init() commit removes a redundant check that shouldn't have been there in the first place" * 'work.sane_pwd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make sure that mntns_install() doesn't end up with referral for root path_init(): don't bother with checking MAY_EXEC for LOOKUP_ROOT make sure that fchdir() won't accept referral points, etc.
2017-05-11Merge tag 'for-linus-20170510' of git://git.infradead.org/linux-mtdLinus Torvalds1-1/+1
Pull MTD updates from Brian Norris: "NAND, from Boris: - some minor fixes/improvements on existing drivers (fsmc, gpio, ifc, davinci, brcmnand, omap) - a huge cleanup/rework of the denali driver accompanied with core fixes/improvements to simplify the driver code - a complete rewrite of the atmel driver to support new DT bindings make future evolution easier - the addition of per-vendor detection/initialization steps to avoid extending the nand_ids table with more extended-id entries SPI NOR, from Cyrille: - fixes in the hisi, intel and Mediatek SPI controller drivers - fixes to some SPI flash memories not supporting the Chip Erase command. - add support to some new memory parts (Winbond, Macronix, Micron, ESMT). - add new driver for the STM32 QSPI controller And a few fixes for Gemini and Versatile platforms on physmap-of" * tag 'for-linus-20170510' of git://git.infradead.org/linux-mtd: (100 commits) MAINTAINERS: Update NAND subsystem git repositories mtd: nand: gpio: update binding mtd: nand: add ooblayout for old hamming layout mtd: oxnas_nand: Allocating more than necessary in probe() dt-bindings: mtd: Document the STM32 QSPI bindings mtd: mtk-nor: set controller's address width according to nor flash mtd: spi-nor: add driver for STM32 quad spi flash controller mtd: nand: brcmnand: Check flash #WP pin status before nand erase/program mtd: nand: davinci: add comment on NAND subpage write status on keystone mtd: nand: omap2: Fix partition creation via cmdline mtdparts mtd: nand: NULL terminate a of_device_id table mtd: nand: Fix a couple error codes mtd: nand: allow drivers to request minimum alignment for passed buffer mtd: nand: allocate aligned buffers if NAND_OWN_BUFFERS is unset mtd: nand: denali: allow to override revision number mtd: nand: denali_dt: use pdev instead of ofdev for platform_device mtd: nand: denali_dt: remove dma-mask DT property mtd: nand: denali: support 64bit capable DMA engine mtd: nand: denali_dt: enable HW_ECC_FIXUP for Altera SOCFPGA variant mtd: nand: denali: support HW_ECC_FIXUP capability ...
2017-05-10filesystem-dax: fix broken __dax_zero_page_range() conversionDan Williams1-2/+2
The conversion of __dax_zero_page_range() to 'struct dax_operations' caused it to frequently fail. The mistake was treating the @size parameter as a dax mapping length rather than just a length of the clear_pmem() operation. The dax mapping length is assumed to be hard coded as PAGE_SIZE. Without this fix any page unaligned zeroing request will trigger a -EINVAL return from bdev_dax_pgoff(). Cc: Jan Kara <jack@suse.com> Cc: Christoph Hellwig <hch@lst.de> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Fixes: cccbce671582 ("filesystem-dax: convert to dax_direct_access()") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-10Merge tag 'hwparam-20170420' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull hw lockdown support from David Howells: "Annotation of module parameters that configure hardware resources including ioports, iomem addresses, irq lines and dma channels. This allows a future patch to prohibit the use of such module parameters to prevent that hardware from being abused to gain access to the running kernel image as part of locking the kernel down under UEFI secure boot conditions. Annotations are made by changing: module_param(n, t, p) module_param_named(n, v, t, p) module_param_array(n, t, m, p) to: module_param_hw(n, t, hwtype, p) module_param_hw_named(n, v, t, hwtype, p) module_param_hw_array(n, t, hwtype, m, p) where the module parameter refers to a hardware setting hwtype specifies the type of the resource being configured. This can be one of: ioport Module parameter configures an I/O port iomem Module parameter configures an I/O mem address ioport_or_iomem Module parameter could be either (runtime set) irq Module parameter configures an I/O port dma Module parameter configures a DMA channel dma_addr Module parameter configures a DMA buffer address other Module parameter configures some other value Note that the hwtype is compile checked, but not currently stored (the lockdown code probably won't require it). It is, however, there for future use. A bonus is that the hwtype can also be used for grepping. The intention is for the kernel to ignore or reject attempts to set annotated module parameters if lockdown is enabled. This applies to options passed on the boot command line, passed to insmod/modprobe or direct twiddling in /sys/module/ parameter files. The module initialisation then needs to handle the parameter not being set, by (1) giving an error, (2) probing for a value or (3) using a reasonable default. What I can't do is just reject a module out of hand because it may take a hardware setting in the module parameters. Some important modules, some ipmi stuff for instance, both probe for hardware and allow hardware to be manually specified; if the driver is aborts with any error, you don't get any ipmi hardware. Further, trying to do this entirely in the module initialisation code doesn't protect against sysfs twiddling. [!] Note that in and of itself, this series of patches should have no effect on the the size of the kernel or code execution - that is left to a patch in the next series to effect. It does mark annotated kernel parameters with a KERNEL_PARAM_FL_HWPARAM flag in an already existing field" * tag 'hwparam-20170420' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (38 commits) Annotate hardware config module parameters in sound/pci/ Annotate hardware config module parameters in sound/oss/ Annotate hardware config module parameters in sound/isa/ Annotate hardware config module parameters in sound/drivers/ Annotate hardware config module parameters in fs/pstore/ Annotate hardware config module parameters in drivers/watchdog/ Annotate hardware config module parameters in drivers/video/ Annotate hardware config module parameters in drivers/tty/ Annotate hardware config module parameters in drivers/staging/vme/ Annotate hardware config module parameters in drivers/staging/speakup/ Annotate hardware config module parameters in drivers/staging/media/ Annotate hardware config module parameters in drivers/scsi/ Annotate hardware config module parameters in drivers/pcmcia/ Annotate hardware config module parameters in drivers/pci/hotplug/ Annotate hardware config module parameters in drivers/parport/ Annotate hardware config module parameters in drivers/net/wireless/ Annotate hardware config module parameters in drivers/net/wan/ Annotate hardware config module parameters in drivers/net/irda/ Annotate hardware config module parameters in drivers/net/hamradio/ Annotate hardware config module parameters in drivers/net/ethernet/ ...
2017-05-10Merge tag 'nfsd-4.12' of git://linux-nfs.org/~bfields/linuxLinus Torvalds9-60/+97
Pull nfsd updates from Bruce Fields: "Another RDMA update from Chuck Lever, and a bunch of miscellaneous bugfixes" * tag 'nfsd-4.12' of git://linux-nfs.org/~bfields/linux: (26 commits) nfsd: Fix up the "supattr_exclcreat" attributes nfsd: encoders mustn't use unitialized values in error cases nfsd: fix undefined behavior in nfsd4_layout_verify lockd: fix lockd shutdown race NFSv4: Fix callback server shutdown SUNRPC: Refactor svc_set_num_threads() NFSv4.x/callback: Create the callback service through svc_create_pooled lockd: remove redundant check on block svcrdma: Clean out old XDR encoders svcrdma: Remove the req_map cache svcrdma: Remove unused RDMA Write completion handler svcrdma: Reduce size of sge array in struct svc_rdma_op_ctxt svcrdma: Clean up RPC-over-RDMA backchannel reply processing svcrdma: Report Write/Reply chunk overruns svcrdma: Clean up RDMA_ERROR path svcrdma: Use rdma_rw API in RPC reply path svcrdma: Introduce local rdma_rw API helpers svcrdma: Clean up svc_rdma_get_inv_rkey() svcrdma: Add helper to save pages under I/O svcrdma: Eliminate RPCRDMA_SQ_DEPTH_MULT ...
2017-05-10Merge tag 'nfs-for-4.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds39-2764/+618
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable bugfixes: - Fix use after free in write error path - Use GFP_NOIO for two allocations in writeback - Fix a hang in OPEN related to server reboot - Check the result of nfs4_pnfs_ds_connect - Fix an rcu lock leak Features: - Removal of the unmaintained and unused OSD pNFS layout - Cleanup and removal of lots of unnecessary dprintk()s - Cleanup and removal of some memory failure paths now that GFP_NOFS is guaranteed to never fail. - Remove the v3-only data server limitation on pNFS/flexfiles Bugfixes: - RPC/RDMA connection handling bugfixes - Copy offload: fixes to ensure the copied data is COMMITed to disk. - Readdir: switch back to using the ->iterate VFS interface - File locking fixes from Ben Coddington - Various use-after-free and deadlock issues in pNFS - Write path bugfixes" * tag 'nfs-for-4.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (89 commits) pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabled NFSv4.1: Work around a Linux server bug... NFS append COMMIT after synchronous COPY NFSv4: Fix exclusive create attributes encoding NFSv4: Fix an rcu lock leak nfs: use kmap/kunmap directly NFS: always treat the invocation of nfs_getattr as cache hit when noac is on Fix nfs_client refcounting if kmalloc fails in nfs4_proc_exchange_id and nfs4_proc_async_renew NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION pNFS: Fix NULL dereference in pnfs_generic_alloc_ds_commits pNFS: Fix a typo in pnfs_generic_alloc_ds_commits pNFS: Fix a deadlock when coalescing writes and returning the layout pNFS: Don't clear the layout return info if there are segments to return pNFS: Ensure we commit the layout if it has been invalidated pNFS: Don't send COMMITs to the DSes if the server invalidated our layout pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure path pNFS: Ensure we check layout validity before marking it for return NFS4.1 handle interrupted slot reuse from ERR_DELAY NFSv4: check return value of xdr_inline_decode nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout() ...
2017-05-10nfsd: Fix up the "supattr_exclcreat" attributesTrond Myklebust1-3/+8
If an NFSv4 client asks us for the supattr_exclcreat, then we must not return attributes that are unsupported by this minor version. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Fixes: 75976de6556f ("NFSD: Return word2 bitmask if setting security..,") Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-05-10nfsd: encoders mustn't use unitialized values in error casesJ. Bruce Fields1-4/+4
In error cases, lgp->lg_layout_type may be out of bounds; so we shouldn't be using it until after the check of nfserr. This was seen to crash nfsd threads when the server receives a LAYOUTGET request with a large layout type. GETDEVICEINFO has the same problem. Reported-by: Ari Kauppi <Ari.Kauppi@synopsys.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-05-10Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes are: - Debloat RCU headers - Parallelize SRCU callback handling (plus overlapping patches) - Improve the performance of Tree SRCU on a CPU-hotplug stress test - Documentation updates - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) rcu: Open-code the rcu_cblist_n_lazy_cbs() function rcu: Open-code the rcu_cblist_n_cbs() function rcu: Open-code the rcu_cblist_empty() function rcu: Separately compile large rcu_segcblist functions srcu: Debloat the <linux/rcu_segcblist.h> header srcu: Adjust default auto-expediting holdoff srcu: Specify auto-expedite holdoff time srcu: Expedite first synchronize_srcu() when idle srcu: Expedited grace periods with reduced memory contention srcu: Make rcutorture writer stalls print SRCU GP state srcu: Exact tracking of srcu_data structures containing callbacks srcu: Make SRCU be built by default srcu: Fix Kconfig botch when SRCU not selected rcu: Make non-preemptive schedule be Tasks RCU quiescent state srcu: Expedite srcu_schedule_cbs_snp() callback invocation srcu: Parallelize callback handling kvm: Move srcu_struct fields to end of struct kvm rcu: Fix typo in PER_RCU_NODE_PERIOD header comment rcu: Use true/false in assignment to bool rcu: Use bool value directly ...
2017-05-10Merge branch 'overlayfs-linus' of ↵Linus Torvalds9-51/+417
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs update from Miklos Szeredi: "The biggest part of this is making st_dev/st_ino on the overlay behave like a normal filesystem (i.e. st_ino doesn't change on copy up, st_dev is the same for all files and directories). Currently this only works if all layers are on the same filesystem, but future work will move the general case towards more sane behavior. There are also miscellaneous fixes, including fixes to handling append-only files. There's a small change in the VFS, but that only has an effect on overlayfs, since otherwise file->f_path.dentry->inode and file_inode(file) are always the same" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: update documentation w.r.t. constant inode numbers ovl: persistent inode numbers for upper hardlinks ovl: merge getattr for dir and nondir ovl: constant st_ino/st_dev across copy up ovl: persistent inode number for directories ovl: set the ORIGIN type flag ovl: lookup non-dir copy-up-origin by file handle ovl: use an auxiliary var for overlay root entry ovl: store file handle of lower inode on copy up ovl: check if all layers are on the same fs ovl: do not set overlay.opaque on non-dir create ovl: check IS_APPEND() on real upper inode vfs: ftruncate check IS_APPEND() on real upper inode ovl: Use designated initializers ovl: lockdep annotate of nested stacked overlayfs inode lock
2017-05-10Merge branch 'for-linus' of ↵Linus Torvalds4-24/+50
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "Support for pid namespaces from Seth and refcount_t work from Elena" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: Add support for pid namespaces fuse: convert fuse_conn.count from atomic_t to refcount_t fuse: convert fuse_req.count from atomic_t to refcount_t fuse: convert fuse_file.count from atomic_t to refcount_t
2017-05-10Merge tag 'ceph-for-4.12-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds13-86/+255
Pull ceph updates from Ilya Dryomov: "The two main items are support for disabling automatic rbd exclusive lock transfers from myself and the long awaited -ENOSPC handling series from Jeff. The former will allow rbd users to take advantage of exclusive lock's built-in blacklist/break-lock functionality while staying in control of who owns the lock. With the latter in place, we will abort filesystem writes on -ENOSPC instead of having them block indefinitely. Beyond that we've got the usual pile of filesystem fixes from Zheng, some refcount_t conversion patches from Elena and a patch for an ancient open() flags handling bug from Alexander" * tag 'ceph-for-4.12-rc1' of git://github.com/ceph/ceph-client: (31 commits) ceph: fix memory leak in __ceph_setxattr() ceph: fix file open flags on ppc64 ceph: choose readdir frag based on previous readdir reply rbd: exclusive map option rbd: return ResponseMessage result from rbd_handle_request_lock() rbd: kill rbd_is_lock_supported() rbd: support updating the lock cookie without releasing the lock rbd: store lock cookie rbd: ignore unlock errors rbd: fix error handling around rbd_init_disk() rbd: move rbd_unregister_watch() call into rbd_dev_image_release() rbd: move rbd_dev_destroy() call out of rbd_dev_image_release() ceph: when seeing write errors on an inode, switch to sync writes Revert "ceph: SetPageError() for writeback pages if writepages fails" ceph: handle epoch barriers in cap messages libceph: add an epoch_barrier field to struct ceph_osd_client libceph: abort already submitted but abortable requests when map or pool goes full libceph: allow requests to return immediately on full conditions if caller wishes libceph: remove req->r_replay_version ceph: make seeky readdir more efficient ...
2017-05-10Merge branch 'for-linus-4.12' of ↵Linus Torvalds37-827/+1437
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "This has fixes and cleanups Dave Sterba collected for the merge window. The biggest functional fixes are between btrfs raid5/6 and scrub, and raid5/6 and device replacement. Some of our pending qgroup fixes are included as well while I bash on the rest in testing. We also have the usual set of cleanups, including one that makes __btrfs_map_block() much more maintainable, and conversions from atomic_t to refcount_t" * 'for-linus-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (71 commits) btrfs: fix the gfp_mask for the reada_zones radix tree Btrfs: fix reported number of inode blocks Btrfs: send, fix file hole not being preserved due to inline extent Btrfs: fix extent map leak during fallocate error path Btrfs: fix incorrect space accounting after failure to insert inline extent Btrfs: fix invalid attempt to free reserved space on failure to cow range btrfs: Handle delalloc error correctly to avoid ordered extent hang btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error btrfs: check if the device is flush capable btrfs: delete unused member nobarriers btrfs: scrub: Fix RAID56 recovery race condition btrfs: scrub: Introduce full stripe lock for RAID56 btrfs: Use ktime_get_real_ts for root ctime Btrfs: handle only applicable errors returned by btrfs_get_extent btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option btrfs: use q which is already obtained from bdev_get_queue Btrfs: switch to div64_u64 if with a u64 divisor Btrfs: update scrub_parity to use u64 stripe_len Btrfs: enable repair during read for raid56 profile btrfs: use clear_page where appropriate ...
2017-05-09nfsd: fix undefined behavior in nfsd4_layout_verifyAri Kauppi1-1/+2
UBSAN: Undefined behaviour in fs/nfsd/nfs4proc.c:1262:34 shift exponent 128 is too large for 32-bit type 'int' Depending on compiler+architecture, this may cause the check for layout_type to succeed for overly large values (which seems to be the case with amd64). The large value will be later used in de-referencing nfsd4_layout_ops for function pointers. Reported-by: Jani Tuovila <tuovila@synopsys.com> Signed-off-by: Ari Kauppi <ari@synopsys.com> [colin.king@canonical.com: use LAYOUT_TYPE_MAX instead of 32] Cc: stable@vger.kernel.org Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-05-09pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabledTrond Myklebust1-0/+11
Layoutstats is always desirable when using the flexfiles driver, so we should enable it if that driver is being loaded. It is safe to do so, because even when the mount specifies NFSv4.1, we will turn it off if the server tells us it is unsupported. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-09NFSv4.1: Work around a Linux server bug...Trond Myklebust1-0/+6
It turns out the Linux server has a bug in its implementation of supattr_exclcreat; it returns the set of all attributes, whether or not they are supported by minor version 1. In order to avoid a regression, we therefore apply the supported_attrs as a mask on top of whatever the server sent us. Reported-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-09Merge branch 'work.misc' of ↵Linus Torvalds19-58/+252
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted bits and pieces from various people. No common topic in this pile, sorry" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs/affs: add rename exchange fs/affs: add rename2 to prepare multiple methods Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx() fs: don't set *REFERENCED on single use objects fs: compat: Remove warning from COMPATIBLE_IOCTL remove pointless extern of atime_need_update_rcu() fs: completely ignore unknown open flags fs: add a VALID_OPEN_FLAGS fs: remove _submit_bh() fs: constify tree_descr arrays passed to simple_fill_super() fs: drop duplicate header percpu-rwsem.h fs/affs: bugfix: Write files greater than page size on OFS fs/affs: bugfix: enable writes on OFS disks fs/affs: remove node generation check fs/affs: import amigaffs.h fs/affs: bugfix: make symbolic links work again
2017-05-09proc: try to remove use of FOLL_FORCE entirelyLinus Torvalds1-4/+1
We fixed the bugs in it, but it's still an ugly interface, so let's see if anybody actually depends on it. It's entirely possible that nothing actually requires the whole "punch through read-only mappings" semantics. For example, gdb definitely uses the /proc/<pid>/mem interface, but it looks like it mainly does it for regular reads of the target (that don't need FOLL_FORCE), and looking at the gdb source code seems to fall back on the traditional ptrace(PTRACE_POKEDATA) interface if it needs to. If this breaks something, I do have a (more complex) version that only enables FOLL_FORCE when somebody has PTRACE_ATTACH'ed to the target, like the comment here used to say ("Maybe we should limit FOLL_FORCE to actual ptrace users?"). Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds42-225/+176
Merge more updates from Andrew Morton: - the rest of MM - various misc things - procfs updates - lib/ updates - checkpatch updates - kdump/kexec updates - add kvmalloc helpers, use them - time helper updates for Y2038 issues. We're almost ready to remove current_fs_time() but that awaits a btrfs merge. - add tracepoints to DAX * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits) drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4 selftests/vm: add a test for virtual address range mapping dax: add tracepoint to dax_insert_mapping() dax: add tracepoint to dax_writeback_one() dax: add tracepoints to dax_writeback_mapping_range() dax: add tracepoints to dax_load_hole() dax: add tracepoints to dax_pfn_mkwrite() dax: add tracepoints to dax_iomap_pte_fault() mtd: nand: nandsim: convert to memalloc_noreclaim_*() treewide: convert PF_MEMALLOC manipulations to new helpers mm: introduce memalloc_noreclaim_{save,restore} mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required mm/huge_memory.c: use zap_deposited_table() more time: delete CURRENT_TIME_SEC and CURRENT_TIME gfs2: replace CURRENT_TIME with current_time apparmorfs: replace CURRENT_TIME with current_time() lustre: replace CURRENT_TIME macro fs: ubifs: replace CURRENT_TIME_SEC with current_time fs: ufs: use ktime_get_real_ts64() for birthtime ...
2017-05-08dax: add tracepoint to dax_insert_mapping()Ross Zwisler1-0/+1
Add a tracepoint to dax_insert_mapping(), following the same logging conventions as the rest of DAX. This tracepoint, along with the one in dax_load_hole(), lets us know how a DAX PTE fault was serviced. Here is an example DAX fault that inserts a PTE mapping: small-1126 [007] .... 145.451604: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 small-1126 [007] .... 145.452317: dax_insert_mapping: dev 259:0 ino 0x1003 shared write address 0x10420000 radix_entry 0x100006 small-1126 [007] .... 145.452399: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE Link: http://lkml.kernel.org/r/20170221195116.13278-7-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08dax: add tracepoint to dax_writeback_one()Ross Zwisler1-0/+1
Add a tracepoint to dax_writeback_one(), following the same logging conventions as the rest of DAX. Here is an example range writeback which ends up flushing one PMD and one PTE: test-1265 [003] .... 496.615250: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff test-1265 [003] .... 496.616263: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x0 pglen 0x200 test-1265 [003] .... 496.616270: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x305 pglen 0x1 test-1265 [003] .... 496.616272: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff [akpm@linux-foundation.org: struct blk_dax_ctl has disappeared] Link: http://lkml.kernel.org/r/20170221195116.13278-6-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08dax: add tracepoints to dax_writeback_mapping_range()Ross Zwisler1-5/+7
Add tracepoints to dax_writeback_mapping_range(), following the same logging conventions as the rest of DAX. Here is an example writeback call: msync-1085 [006] .... 200.902565: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff msync-1085 [006] .... 200.902579: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff [ross.zwisler@linux.intel.com: fix regression in dax_writeback_mapping_range()] Link: http://lkml.kernel.org/r/20170314215358.31451-1-ross.zwisler@linux.intel.com Link: http://lkml.kernel.org/r/20170221195116.13278-5-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08dax: add tracepoints to dax_load_hole()Ross Zwisler1-5/+11
Add tracepoints to dax_load_hole(), following the same logging conventions as the rest of DAX. Here is the logging generated by a PTE read from a hole: read-1075 [002] .... 62.362108: dax_pte_fault: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 read-1075 [002] .... 62.362140: dax_load_hole: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE read-1075 [002] .... 62.362141: dax_pte_fault_done: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE Link: http://lkml.kernel.org/r/20170221195116.13278-4-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08dax: add tracepoints to dax_pfn_mkwrite()Ross Zwisler1-0/+3
Add tracepoints to dax_pfn_mkwrite(), following the same logging conventions as the rest of DAX. Here is an example PTE fault followed by a pfn_mkwrite: small_aligned-1094 [002] .... 374.084998: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 small_aligned-1094 [002] .... 374.085145: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 MAJOR|NOPAGE small_aligned-1094 [002] .... 374.085165: dax_pfn_mkwrite: dev 259:0 ino 0x1003 shared WRITE|MKWRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 NOPAGE Link: http://lkml.kernel.org/r/20170221195116.13278-3-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08dax: add tracepoints to dax_iomap_pte_fault()Ross Zwisler1-4/+11
Patch series "second round of tracepoints for DAX". This second round of DAX tracepoint patches adds tracing to the PTE fault path (dax_iomap_pte_fault(), dax_pfn_mkwrite(), dax_load_hole(), dax_insert_mapping()) and to the writeback path (dax_writeback_mapping_range(), dax_writeback_one()). The purpose of this tracing is to give us a high level view of what DAX is doing, whether faults are being serviced by PMDs or PTEs, and by real storage or by zero pages covering holes. I do have some patches nearly ready which also add tracing to grab_mapping_entry() and dax_insert_mapping_entry(). These are more targeted at logging how we are interacting with the radix tree, how we use empty entries for locking, whether we "downgrade" huge zero pages to 4k PTE sized allocations, etc. In the end it seemed to me that this might be too detailed to have as constantly present tracepoints, but if anyone sees value in having tracepoints like this in the DAX code permanently (Jan?), please let me know and I'll add those last two patches. All these tracepoints were done to be consistent with the style of the XFS tracepoints and with the existing DAX PMD tracepoints. This patch (of 6): Add tracepoints to dax_iomap_pte_fault(), following the same logging conventions as the rest of DAX. Here is an example fault that initially tries to be serviced by the PMD fault handler but which falls back to PTEs because the VMA isn't large enough to hold a PMD: small-1086 [005] .... 71.140014: xfs_filemap_huge_fault: dev 259:0 ino 0x1003 small-1086 [005] .... 71.140027: dax_pmd_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400 small-1086 [005] .... 71.140028: dax_pmd_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400 FALLBACK small-1086 [005] .... 71.140035: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 small-1086 [005] .... 71.140396: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE Link: http://lkml.kernel.org/r/20170221195116.13278-2-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08gfs2: replace CURRENT_TIME with current_timeStephen Rothwell1-2/+2
Link: http://lkml.kernel.org/r/20170420161852.0492bc3f@canb.auug.org.au Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08fs: ubifs: replace CURRENT_TIME_SEC with current_timeDeepa Dinamani6-30/+26
CURRENT_TIME_SEC is not y2038 safe. current_time() will be transitioned to use 64 bit time along with vfs in a separate patch. There is no plan to transition CURRENT_TIME_SEC to use y2038 safe time interfaces. current_time() returns timestamps according to the granularities set in the inode's super_block. The granularity check to call current_fs_time() or CURRENT_TIME_SEC is not required. Use current_time() directly to update inode timestamp. Use timespec_trunc during file system creation, before the first inode is created. Link: http://lkml.kernel.org/r/1491613030-11599-9-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: Richard Weinberger <richard@nod.at> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>