summaryrefslogtreecommitdiffstats
path: root/fs/erofs
AgeCommit message (Collapse)AuthorFilesLines
2022-01-04erofs: use meta buffers for zmap operationsGao Xiang4-70/+28
Get rid of old erofs_get_meta_page() within zmap operations by using on-stack meta buffers in order to prepare subpage and folio features. Finally, erofs_get_meta_page() is useless. Get rid of it! Link: https://lore.kernel.org/r/20220102040017.51352-6-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-01-04erofs: use meta buffers for xattr operationsGao Xiang2-95/+41
Get rid of old erofs_get_meta_page() within xattr operations by using on-stack meta buffers in order to prepare subpage and folio features. Link: https://lore.kernel.org/r/20220102040017.51352-5-hsiangkao@linux.alibaba.com Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-01-04erofs: use meta buffers for super operationsGao Xiang1-77/+27
Get rid of old erofs_get_meta_page() within super operations by using on-stack meta buffers in order to prepare subpage and folio features. Link: https://lore.kernel.org/r/20220102081317.109797-1-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-01-04erofs: use meta buffers for inode operationsGao Xiang2-36/+35
Get rid of old erofs_get_meta_page() within inode operations by using on-stack meta buffers in order to prepare subpage and folio features. Link: https://lore.kernel.org/r/20220102040017.51352-3-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-01-04erofs: introduce meta buffer operationsGao Xiang2-21/+90
In order to support subpage and folio for all uncompressed files, introduce meta buffer descriptors, which can be effectively stored on stack, in place of meta page operations. This converts the uncompressed data path to meta buffers. Link: https://lore.kernel.org/r/20220102040017.51352-2-hsiangkao@linux.alibaba.com Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-31erofs: add on-disk compressed tail-packing inline supportYue Hu5-31/+103
Introduces erofs compressed tail-packing inline support. This approach adds a new field called `h_idata_size' in the per-file compression header to indicate the encoded size of each tail-packing pcluster. At runtime, it will find the start logical offset of the tail pcluster when initializing per-inode zmap and record such extent (headlcn, idataoff) information to the in-memory inode. Therefore, follow-on requests can directly recognize if one pcluster is a tail-packing inline pcluster or not. Link: https://lore.kernel.org/r/20211228054604.114518-6-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-31erofs: support inline data decompressionYue Hu2-44/+119
Currently, we have already support tail-packing inline for uncompressed file, let's also implement this for compressed files to save I/Os and storage space. Different from normal pclusters, compressed data is available in advance because of other metadata I/Os. Therefore, they directly move into the bypass queue without extra I/O submission. It's the last compression feature before folio/subpage support. Link: https://lore.kernel.org/r/20211228232919.21413-1-xiang@kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-31erofs: support unaligned data decompressionGao Xiang1-8/+9
Previously, compressed data was assumed as block-aligned. This should be changed due to in-block tail-packing inline data. Link: https://lore.kernel.org/r/20211228054604.114518-4-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-29erofs: introduce z_erofs_fixup_insizeGao Xiang3-21/+36
To prepare for the upcoming ztailpacking feature, introduce z_erofs_fixup_insize() and pageofs_in to wrap up the process to get the exact compressed size via zero padding. Link: https://lore.kernel.org/r/20211228054604.114518-3-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-29erofs: tidy up z_erofs_lz4_decompressGao Xiang1-39/+44
To prepare for the upcoming ztailpacking feature and further cleanups, introduce a unique z_erofs_lz4_decompress_ctx to keep the context, including inpages, outpages and oend, which are frequently used by the lz4 decompressor. No logic changes. Link: https://lore.kernel.org/r/20211228054604.114518-2-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-09erofs: clean up erofs_map_blocks tracepointsGao Xiang1-22/+17
Since the new type of chunk-based files is introduced, there is no need to leave flatmode tracepoints. Rename to erofs_map_blocks instead. Link: https://lore.kernel.org/r/20211209012918.30337-1-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-08erofs: Replace zero-length array with flexible-array memberGao Xiang1-2/+2
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use `flexible array members' [1] for these cases. The older style of one-element or zero-length arrays should no longer be used [2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.15/process/deprecated.html#zero-length-and-one-element-arrays Link: https://lore.kernel.org/r/20211206121702.221331-1-hsiangkao@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-08erofs: add sysfs node to control sync decompression strategyHuang Jianan4-7/+46
Although readpage is a synchronous path, there will be no additional kworker scheduling overhead in non-atomic contexts together with dm-verity. Let's add a sysfs node to disable sync decompression as an option. Link: https://lore.kernel.org/r/20211206143552.8384-1-huangjianan@oppo.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-08erofs: add sysfs interfaceHuang Jianan4-1/+264
Add sysfs interface to configure erofs related parameters later. Link: https://lore.kernel.org/r/20211201145436.4357-1-huangjianan@oppo.com Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-01erofs: rename lz4_0pading to zero_paddingHuang Jianan3-5/+5
Renaming lz4_0padding to zero_padding globally since LZMA and later algorithms also need that. Link: https://lore.kernel.org/r/20211112160935.19394-1-jnhuang95@gmail.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-11-23erofs: fix deadlock when shrink erofs slabHuang Jianan1-2/+6
We observed the following deadlock in the stress test under low memory scenario: Thread A Thread B - erofs_shrink_scan - erofs_try_to_release_workgroup - erofs_workgroup_try_to_freeze -- A - z_erofs_do_read_page - z_erofs_collection_begin - z_erofs_register_collection - erofs_insert_workgroup - xa_lock(&sbi->managed_pslots) -- B - erofs_workgroup_get - erofs_wait_on_workgroup_freezed -- A - xa_erase - xa_lock(&sbi->managed_pslots) -- B To fix this, it needs to hold xa_lock before freezing the workgroup since xarray will be touched then. So let's hold the lock before accessing each workgroup, just like what we did with the radix tree before. [ Gao Xiang: Jianhua Hao also reports this issue at https://lore.kernel.org/r/b10b85df30694bac8aadfe43537c897a@xiaomi.com ] Link: https://lore.kernel.org/r/20211118135844.3559-1-huangjianan@oppo.com Fixes: 64094a04414f ("erofs: convert workstn to XArray") Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Reported-by: Jianhua Hao <haojianhua1@xiaomi.com> Signed-off-by: Gao Xiang <xiang@kernel.org>
2021-11-13Merge tag 'erofs-for-5.16-rc1-fixes' of ↵Linus Torvalds3-30/+17
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - fix unsafe pagevec reuse which could cause unexpected behaviors - get rid of the unused DELAYEDALLOC strategy that has been replaced by TRYALLOC * tag 'erofs-for-5.16-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: remove useless cache strategy of DELAYEDALLOC erofs: fix unsafe pagevec reuse of hooked pclusters
2021-11-08erofs: remove useless cache strategy of DELAYEDALLOCYue Hu2-21/+0
After commit 1825c8d7ce93 ("erofs: force inplace I/O under low memory scenario") and TRYALLOC is widely used, DELAYEDALLOC won't be used anymore. Remove related dead code. Also, remove the blank line at the end of zdata.h. Link: https://lore.kernel.org/r/20211106082315.25781-1-huyue2@yulong.com Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <xiang@kernel.org>
2021-11-08erofs: fix unsafe pagevec reuse of hooked pclustersGao Xiang2-9/+17
There are pclusters in runtime marked with Z_EROFS_PCLUSTER_TAIL before actual I/O submission. Thus, the decompression chain can be extended if the following pcluster chain hooks such tail pcluster. As the related comment mentioned, if some page is made of a hooked pcluster and another followed pcluster, it can be reused for in-place I/O (since I/O should be submitted anyway): _______________________________________________________________ | tail (partial) page | head (partial) page | |_____PRIMARY_HOOKED___|____________PRIMARY_FOLLOWED____________| However, it's by no means safe to reuse as pagevec since if such PRIMARY_HOOKED pclusters finally move into bypass chain without I/O submission. It's somewhat hard to reproduce with LZ4 and I just found it (general protection fault) by ro_fsstressing a LZMA image for long time. I'm going to actively clean up related code together with multi-page folio adaption in the next few months. Let's address it directly for easier backporting for now. Call trace for reference: z_erofs_decompress_pcluster+0x10a/0x8a0 [erofs] z_erofs_decompress_queue.isra.36+0x3c/0x60 [erofs] z_erofs_runqueue+0x5f3/0x840 [erofs] z_erofs_readahead+0x1e8/0x320 [erofs] read_pages+0x91/0x270 page_cache_ra_unbounded+0x18b/0x240 filemap_get_pages+0x10a/0x5f0 filemap_read+0xa9/0x330 new_sync_read+0x11b/0x1a0 vfs_read+0xf1/0x190 Link: https://lore.kernel.org/r/20211103182006.4040-1-xiang@kernel.org Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-11-02Merge tag 'gfs2-v5.15-rc5-mmap-fault' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 mmap + page fault deadlocks fixes from Andreas Gruenbacher: "Functions gfs2_file_read_iter and gfs2_file_write_iter are both accessing the user buffer to write to or read from while holding the inode glock. In the most basic deadlock scenario, that buffer will not be resident and it will be mapped to the same file. Accessing the buffer will trigger a page fault, and gfs2 will deadlock trying to take the same inode glock again while trying to handle that fault. Fix that and similar, more complex scenarios by disabling page faults while accessing user buffers. To make this work, introduce a small amount of new infrastructure and fix some bugs that didn't trigger so far, with page faults enabled" * tag 'gfs2-v5.15-rc5-mmap-fault' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix mmap + page fault deadlocks for direct I/O iov_iter: Introduce nofault flag to disable page faults gup: Introduce FOLL_NOFAULT flag to disable page faults iomap: Add done_before argument to iomap_dio_rw iomap: Support partial direct I/O on user copy failures iomap: Fix iomap_dio_rw return value for user copies gfs2: Fix mmap + page fault deadlocks for buffered I/O gfs2: Eliminate ip->i_gh gfs2: Move the inode glock locking to gfs2_file_buffered_write gfs2: Introduce flag for glock holder auto-demotion gfs2: Clean up function may_grant gfs2: Add wrapper for iomap_file_buffered_write iov_iter: Introduce fault_in_iov_iter_writeable iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} powerpc/kvm: Fix kvm_use_magic_page iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value
2021-10-31erofs: don't trigger WARN() when decompression failsGao Xiang1-1/+0
syzbot reported a WARNING [1] due to corrupted compressed data. As Dmitry said, "If this is not a kernel bug, then the code should not use WARN. WARN if for kernel bugs and is recognized as such by all testing systems and humans." [1] https://lore.kernel.org/r/000000000000b3586105cf0ff45e@google.com Link: https://lore.kernel.org/r/20211025074311.130395-1-hsiangkao@linux.alibaba.com Cc: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Reported-by: syzbot+d8aaffc3719597e8cfb4@syzkaller.appspotmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-10-25erofs: get rid of ->lru usageGao Xiang7-44/+53
Currently, ->lru is a way to arrange non-LRU pages and has some in-kernel users. In order to minimize noticable issues of page reclaim and cache thrashing under high memory presure, limited temporary pages were all chained with ->lru and can be reused during the request. However, it seems that ->lru could be removed when folio is landing. Let's use page->private to chain temporary pages for now instead and transform EROFS formally after the topic of the folio / file page design is finalized. Link: https://lore.kernel.org/r/20211022090120.14675-1-hsiangkao@linux.alibaba.com Cc: Matthew Wilcox <willy@infradead.org> Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-10-24iomap: Add done_before argument to iomap_dio_rwAndreas Gruenbacher1-1/+1
Add a done_before argument to iomap_dio_rw that indicates how much of the request has already been transferred. When the request succeeds, we report that done_before additional bytes were tranferred. This is useful for finishing a request asynchronously when part of the request has already been completed synchronously. We'll use that to allow iomap_dio_rw to be used with page faults disabled: when a page fault occurs while submitting a request, we synchronously complete the part of the request that has already been submitted. The caller can then take care of the page fault and call iomap_dio_rw again for the rest of the request, passing in the number of bytes already tranferred. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-10-19erofs: lzma compression supportGao Xiang11-21/+383
Add MicroLZMA support in order to maximize compression ratios for specific scenarios. For example, it's useful for low-end embedded boards and as a secondary algorithm in a file for specific access patterns. MicroLZMA is a new container format for raw LZMA1, which was created by Lasse Collin aiming to minimize old LZMA headers and get rid of unnecessary EOPM (end of payload marker) as well as to enable fixed-sized output compression, especially for 4KiB pclusters. Similar to LZ4, inplace I/O approach is used to minimize runtime memory footprint when dealing with I/O. Overlapped decompression is handled with 1) bounced buffer for data under processing or 2) extra short-lived pages from the on-stack pagepool which will be shared in the same read request (128KiB for example). Link: https://lore.kernel.org/r/20211010213145.17462-8-xiang@kernel.org Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-10-19erofs: rename some generic methods in decompressorGao Xiang1-33/+30
Previously, some LZ4 methods were named with `generic'. However, while evaluating the effective LZMA approach, it seems they aren't quite generic at all (e.g. no need preparing dstpages for most LZMA cases.) Avoid such naming instead. Link: https://lore.kernel.org/r/20211010213145.17462-7-xiang@kernel.org Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-10-19erofs: introduce readmore decompression strategyGao Xiang2-19/+93
Previously, the readahead window was strictly followed by EROFS decompression strategy in order to minimize extra memory footprint. However, it could become inefficient if just reading the partial requested data for much big LZ4 pclusters and the upcoming LZMA implementation. Let's try to request the leading data in a pcluster without triggering memory reclaiming instead for the LZ4 approach first to boost up 100% randread of large big pclusters, and it has no real impact on low memory scenarios. It also introduces a way to expand read lengths in order to decompress the whole pcluster, which is useful for LZMA since the algorithm itself is relatively slow and causes CPU bound, but LZ4 is not. Link: https://lore.kernel.org/r/20211008200839.24541-4-xiang@kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-10-19erofs: introduce the secondary compression headGao Xiang2-32/+48
Previously, for each HEAD lcluster, it can be either HEAD or PLAIN lcluster to indicate whether the whole pcluster is compressed or not. In this patch, a new HEAD2 head type is introduced to specify another compression algorithm other than the primary algorithm for each compressed file, which can be used for upcoming LZMA compression and LZ4 range dictionary compression for various data patterns. It has been stayed in the EROFS roadmap for years. Complete it now! Link: https://lore.kernel.org/r/20211017165721.2442-1-xiang@kernel.org Reviewed-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-10-18erofs: get compression algorithms directly on mappingGao Xiang4-23/+25
Currently, z_erofs_map_blocks_iter() returns whether extents are compressed or not, and the decompression frontend gets the specific algorithms then. It works but not quite well in many aspests, for example: - The decompression frontend has to deal with whether extents are compressed or not again and lookup the algorithms if compressed. It's duplicated and too detailed about the on-disk mapping. - A new secondary compression head will be introduced later so that each file can have 2 compression algorithms at most for different type of data. It could increase the complexity of the decompression frontend if still handled in this way; - A new readmore decompression strategy will be introduced to get better performance for much bigger pcluster and lzma, which needs the specific algorithm in advance as well. Let's look up compression algorithms in z_erofs_map_blocks_iter() directly instead. Link: https://lore.kernel.org/r/20211008200839.24541-2-xiang@kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-10-18erofs: add multiple device supportGao Xiang6-42/+288
In order to support multi-layer container images, add multiple device feature to EROFS. Two ways are available to use for now: - Devices can be mapped into 32-bit global block address space; - Device ID can be specified with the chunk indexes format. Note that it assumes no extent would cross device boundary and mkfs should take care of it seriously. In the future, a dedicated device manager could be introduced then thus extra devices can be automatically scanned by UUID as well. Link: https://lore.kernel.org/r/20211014081010.43485-1-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-10-17erofs: decouple basic mount options from fs_contextGao Xiang5-43/+45
Previously, EROFS mount options are all in the basic types, so erofs_fs_context can be directly copied with assignment. However, when the multiple device feature is introduced, it's hard to handle multiple device information like the other basic mount options. Let's separate basic mount option usage from fs_context, thus multiple device information can be handled gracefully then. No logic changes. Link: https://lore.kernel.org/r/20211007070224.12833-1-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-10-15erofs: remove the fast path of per-CPU buffer decompressionYue Hu1-55/+8
As Xiang mentioned, such path has no real impact to our current decompression strategy, remove it directly. Also, update the return value of z_erofs_lz4_decompress() to 0 if success to keep consistent with LZMA which will return 0 as well for that case. Link: https://lore.kernel.org/r/20211014065744.1787-1-zbestahu@gmail.com Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-09-23erofs: clear compacted_2b if compacted_4b_initial > totalidxYue Hu1-1/+2
Currently, the whole indexes will only be compacted 4B if compacted_4b_initial > totalidx. So, the calculated compacted_2b is worthless for that case. It may waste CPU resources. No need to update compacted_4b_initial as mkfs since it's used to fulfill the alignment of the 1st compacted_2b pack and would handle the case above. We also need to clarify compacted_4b_end here. It's used for the last lclusters which aren't fitted in the previous compacted_2b packs. Some messages are from Xiang. Link: https://lore.kernel.org/r/20210914035915.1190-1-zbestahu@gmail.com Signed-off-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> [ Gao Xiang: it's enough to use "compacted_4b_initial < totalidx". ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-09-23erofs: fix misbehavior of unsupported chunk format checkGao Xiang1-1/+1
Unsupported chunk format should be checked with "if (vi->chunkformat & ~EROFS_CHUNK_FORMAT_ALL)" Found when checking with 4k-byte blockmap (although currently mkfs uses inode chunk indexes format by default.) Link: https://lore.kernel.org/r/20210922095141.233938-1-hsiangkao@linux.alibaba.com Fixes: c5aa903a59db ("erofs: support reading chunk-based uncompressed files") Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-09-09Merge tag 'libnvdimm-for-5.15' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: - Fix a race condition in the teardown path of raw mode pmem namespaces. - Cleanup the code that filesystems use to detect filesystem-dax capabilities of their underlying block device. * tag 'libnvdimm-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: remove bdev_dax_supported xfs: factor out a xfs_buftarg_is_dax helper dax: stub out dax_supported for !CONFIG_FS_DAX dax: remove __generic_fsdax_supported dax: move the dax_read_lock() locking into dax_supported dax: mark dax_get_by_host static dm: use fs_dax_get_by_bdev instead of dax_get_by_host dax: stop using bdevname fsdax: improve the FS_DAX Kconfig description and help text libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbind
2021-09-02Merge tag 'ovl-update-5.15' of ↵Linus Torvalds2-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs update from Miklos Szeredi: - Copy up immutable/append/sync/noatime attributes (Amir Goldstein) - Improve performance by enabling RCU lookup. - Misc fixes and improvements The reason this touches so many files is that the ->get_acl() method now gets a "bool rcu" argument. The ->get_acl() API was updated based on comments from Al and Linus: Link: https://lore.kernel.org/linux-fsdevel/CAJfpeguQxpd6Wgc0Jd3ks77zcsAv_bn0q17L3VNnnmPKu11t8A@mail.gmail.com/ * tag 'ovl-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: enable RCU'd ->get_acl() vfs: add rcu argument to ->get_acl() callback ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup() ovl: use kvalloc in xattr copy-up ovl: update ctime when changing fileattr ovl: skip checking lower file's i_writecount on truncate ovl: relax lookup error on mismatch origin ftype ovl: do not set overlay.opaque for new directories ovl: add ovl_allow_offline_changes() helper ovl: disable decoding null uuid with redirect_dir ovl: consistent behavior for immutable/append-only inodes ovl: copy up sync/noatime fileattr flags ovl: pass ovl_fs to ovl_check_setxattr() fs: add generic helper for filling statx attribute flags
2021-08-25erofs: fix double free of 'copied'Gao Xiang1-0/+1
Dan reported a new smatch warning [1] "fs/erofs/inode.c:210 erofs_read_inode() error: double free of 'copied'" Due to new chunk-based format handling logic, the error path can be called after kfree(copied). Set "copied = NULL" after kfree(copied) to fix this. [1] https://lore.kernel.org/r/202108251030.bELQozR7-lkp@intel.com Link: https://lore.kernel.org/r/20210825120757.11034-1-hsiangkao@linux.alibaba.com Fixes: c5aa903a59db ("erofs: support reading chunk-based uncompressed files") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-08-20erofs: support reading chunk-based uncompressed filesGao Xiang3-11/+102
Add runtime support for chunk-based uncompressed files described in the previous patch. Link: https://lore.kernel.org/r/20210820100019.208490-2-hsiangkao@linux.alibaba.com Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-08-20erofs: introduce chunk-based file on-disk formatGao Xiang1-2/+45
Currently, uncompressed data except for tail-packing inline is consecutive on disk. In order to support chunk-based data deduplication, add a new corresponding inode data layout. In the future, the data source of chunks can be either (un)compressed. Link: https://lore.kernel.org/r/20210820100019.208490-1-hsiangkao@linux.alibaba.com Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-08-18vfs: add rcu argument to ->get_acl() callbackMiklos Szeredi2-2/+5
Add a rcu argument to the ->get_acl() callback to allow get_cached_acl_rcu() to call the ->get_acl() method in the next patch. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-19erofs: add fiemap support with iomapGao Xiang5-1/+59
This adds fiemap support for both uncompressed files and compressed files by using iomap infrastructure. Link: https://lore.kernel.org/r/20210813052931.203280-3-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-08-19erofs: add support for the full decompressed lengthGao Xiang2-8/+92
Previously, there is no need to get the full decompressed length since EROFS supports partial decompression. However for some other cases such as fiemap, the full decompressed length is necessary for iomap to make it work properly. This patch adds a way to get the full decompressed length. Note that it takes more metadata overhead and it'd be avoided if possible in the performance sensitive scenario. Link: https://lore.kernel.org/r/20210818152231.243691-1-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-08-11erofs: remove the mapping parameter from erofs_try_to_free_cached_page()Yue Hu3-5/+3
The mapping is not used at all, remove it and update related code. Link: https://lore.kernel.org/r/20210810072416.1392-1-zbestahu@gmail.com Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-08-11erofs: directly use wrapper erofs_page_is_managed() when shrinkingYue Hu1-2/+1
We already have the wrapper function to identify managed page. Link: https://lore.kernel.org/r/20210810065450.1320-1-zbestahu@gmail.com Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-08-10erofs: convert all uncompressed cases to iomapGao Xiang1-239/+51
Since tail-packing inline has been supported by iomap now, let's convert all EROFS uncompressed data I/O to iomap, which is pretty straight-forward. Link: https://lore.kernel.org/r/20210805003601.183063-4-hsiangkao@linux.alibaba.com Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-08-10erofs: dax support for non-tailpacking regular fileGao Xiang4-4/+104
DAX is quite useful for some VM use cases in order to save guest memory extremely with minimal lightweight EROFS. In order to prepare for such use cases, add preliminary dax support for non-tailpacking regular files for now. Tested with the DRAM-emulated PMEM and the EROFS image generated by "mkfs.erofs -Enoinline_data enwik9.fsdax.img enwik9" Link: https://lore.kernel.org/r/20210805003601.183063-3-hsiangkao@linux.alibaba.com Cc: nvdimm@lists.linux.dev Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-08-10erofs: iomap support for non-tailpacking DIOHuang Jianan4-1/+100
Add iomap support for non-tailpacking uncompressed data in order to support DIO and DAX. Direct I/O is useful in certain scenarios for uncompressed files. For example, double pagecache can be avoid by direct I/O when loop device is used for uncompressed files containing upper layer compressed filesystem. This adds iomap DIO support for non-tailpacking cases first and tail-packing inline files are handled in the follow-up patch. Link: https://lore.kernel.org/r/20210805003601.183063-2-hsiangkao@linux.alibaba.com Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-06-08erofs: clean up file headers & footersGao Xiang18-34/+0
- Remove my outdated misleading email address; - Get rid of all unnecessary trailing newline by accident. Link: https://lore.kernel.org/r/20210602160634.10757-1-xiang@kernel.org Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-06-08erofs: remove the occupied parameter from z_erofs_pagevec_enqueue()Yue Hu2-7/+2
No any behavior to variable occupied in z_erofs_attach_page() which is only caller to z_erofs_pagevec_enqueue(). Link: https://lore.kernel.org/r/20210419102623.2015-1-zbestahu@gmail.com Signed-off-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Gao Xiang <xiang@kernel.org> Signed-off-by: Gao Xiang <xiang@kernel.org>
2021-06-08erofs: fix error return code in erofs_read_superblock()Wei Yongjun1-0/+1
'ret' will be overwritten to 0 if erofs_sb_has_sb_chksum() return true, thus 0 will return in some error handling cases. Fix to return negative error code -EINVAL instead of 0. Link: https://lore.kernel.org/r/20210519141657.3062715-1-weiyongjun1@huawei.com Fixes: b858a4844cfb ("erofs: support superblock checksum") Cc: stable <stable@vger.kernel.org> # 5.5+ Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Gao Xiang <xiang@kernel.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <xiang@kernel.org>
2021-05-13erofs: fix 1 lcluster-sized pcluster for big pclusterGao Xiang1-2/+19
If the 1st NONHEAD lcluster of a pcluster isn't CBLKCNT lcluster type rather than a HEAD or PLAIN type instead, which means its pclustersize _must_ be 1 lcluster (since its uncompressed size < 2 lclusters), as illustrated below: HEAD HEAD / PLAIN lcluster type ____________ ____________ |_:__________|_________:__| file data (uncompressed) . . .____________. |____________| pcluster data (compressed) Such on-disk case was explained before [1] but missed to be handled properly in the runtime implementation. It can be observed if manually generating 1 lcluster-sized pcluster with 2 lclusters (thus CBLKCNT doesn't exist.) Let's fix it now. [1] https://lore.kernel.org/r/20210407043927.10623-1-xiang@kernel.org Link: https://lore.kernel.org/r/20210510064715.29123-1-xiang@kernel.org Fixes: cec6e93beadf ("erofs: support parsing big pcluster compress indexes") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <xiang@kernel.org>