summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2020-01-31Merge tag 'gfs2-for-5.6' of ↵Linus Torvalds12-70/+73
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix some corner cases on filesystems with a block size < page size. - Fix a corner case that could expose incorrect access times over nfs. - Revert an otherwise sensible revoke accounting cleanup that causes assertion failures. The revoke accounting is whacky and needs to be fixed properly before we can add back this cleanup. - Various other minor cleanups. In addition, please expect to see another pull request from Bob Peterson about his gfs2 recovery patch queue shortly. * tag 'gfs2-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Revert "gfs2: eliminate tr_num_revoke_rm" gfs2: remove unused LBIT macros fs/gfs2: remove unused IS_DINODE and IS_LEAF macros gfs2: Remove GFS2_MIN_LVB_SIZE define gfs2: Fix incorrect variable name gfs2: Avoid access time thrashing in gfs2_inode_lookup gfs2: minor cleanup: remove unneeded variable ret in gfs2_jdata_writepage gfs2: eliminate ssize parameter from gfs2_struct2blk gfs2: Another gfs2_find_jhead fix
2020-01-31Merge tag 'iomap-5.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-13/+5
Pull iomap fix from Darrick Wong: "A single patch fixing an off-by-one error when we're checking to see how far we're gotten into an EOF page" * tag 'iomap-5.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fs: Fix page_mkwrite off-by-one errors
2020-01-31Merge branch 'akpm' (patches from Andrew)Linus Torvalds25-170/+236
Pull updates from Andrew Morton: "Most of -mm and quite a number of other subsystems: hotfixes, scripts, ocfs2, misc, lib, binfmt, init, reiserfs, exec, dma-mapping, kcov. MM is fairly quiet this time. Holidays, I assume" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) kcov: ignore fault-inject and stacktrace include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc() execve: warn if process starts with executable stack reiserfs: prevent NULL pointer dereference in reiserfs_insert_item() init/main.c: fix misleading "This architecture does not have kernel memory protection" message init/main.c: fix quoted value handling in unknown_bootoption init/main.c: remove unnecessary repair_env_string in do_initcall_level init/main.c: log arguments and environment passed to init fs/binfmt_elf.c: coredump: allow process with empty address space to coredump fs/binfmt_elf.c: coredump: delete duplicated overflow check fs/binfmt_elf.c: coredump: allocate core ELF header on stack fs/binfmt_elf.c: make BAD_ADDR() unlikely fs/binfmt_elf.c: better codegen around current->mm fs/binfmt_elf.c: don't copy ELF header around fs/binfmt_elf.c: fix ->start_code calculation fs/binfmt_elf.c: smaller code generation around auxv vector fill lib/find_bit.c: uninline helper _find_next_bit() lib/find_bit.c: join _find_next_bit{_le} uapi: rename ext2_swab() to swab() and share globally in swab.h lib/scatterlist.c: adjust indentation in __sg_alloc_table ...
2020-01-31execve: warn if process starts with executable stackAlexey Dobriyan1-0/+5
There were few episodes of silent downgrade to an executable stack over years: 1) linking innocent looking assembly file will silently add executable stack if proper linker options is not given as well: $ cat f.S .intel_syntax noprefix .text .globl f f: ret $ cat main.c void f(void); int main(void) { f(); return 0; } $ gcc main.c f.S $ readelf -l ./a.out GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RWE 0x10 ^^^ 2) converting C99 nested function into a closure https://nullprogram.com/blog/2019/11/15/ void intsort2(int *base, size_t nmemb, _Bool invert) { int cmp(const void *a, const void *b) { int r = *(int *)a - *(int *)b; return invert ? -r : r; } qsort(base, nmemb, sizeof(*base), cmp); } will silently require stack trampolines while non-closure version will not. Without doubt this behaviour is documented somewhere, add a warning so that developers and users can at least notice. After so many years of x86_64 having proper executable stack support it should not cause too many problems. Link: http://lkml.kernel.org/r/20191208171918.GC19716@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Will Deacon <will@kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()Yunfeng Ye1-1/+2
The variable inode may be NULL in reiserfs_insert_item(), but there is no check before accessing the member of inode. Fix this by adding NULL pointer check before calling reiserfs_debug(). Link: http://lkml.kernel.org/r/79c5135d-ff25-1cc9-4e99-9f572b88cc00@huawei.com Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Cc: zhengbin <zhengbin13@huawei.com> Cc: Hu Shiyuan <hushiyuan@huawei.com> Cc: Feilong Lin <linfeilong@huawei.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: coredump: allow process with empty address space to coredumpAlexey Dobriyan1-1/+9
Unmapping whole address space at once with munmap(0, (1ULL<<47) - 4096) or equivalent will create empty coredump. It is silly way to exit, however registers content may still be useful. The right to coredump is fundamental right of a process! Link: http://lkml.kernel.org/r/20191222150137.GA1277@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: coredump: delete duplicated overflow checkAlexey Dobriyan1-2/+0
array_size() macro will do overflow check anyway. Link: http://lkml.kernel.org/r/20191222144009.GB24341@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: coredump: allocate core ELF header on stackAlexey Dobriyan1-11/+5
Comment says ELF header is "too large to be on stack". 64 bytes on 64-bit is not large by any means. Link: http://lkml.kernel.org/r/20191222143850.GA24341@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: make BAD_ADDR() unlikelyAlexey Dobriyan1-1/+1
If some mapping goes past TASK_SIZE it will be rejected by kernel which means no such userspace binaries exist. Mark every such check as unlikely. Link: http://lkml.kernel.org/r/20191215124355.GA21124@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: better codegen around current->mmAlexey Dobriyan1-24/+28
"current->mm" pointer is stable in general except few cases one of which execve(2). Compiler can't treat is as stable but it _is_ stable most of the time. During ELF loading process ->mm becomes stable right after flush_old_exec(). Help compiler by caching current->mm, otherwise it continues to refetch it. add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-141 (-141) Function old new delta elf_core_dump 5062 5039 -23 load_elf_binary 5426 5308 -118 Note: other cases are left as is because it is either pessimisation or no change in binary size. Link: http://lkml.kernel.org/r/20191215124755.GB21124@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: don't copy ELF header aroundAlexey Dobriyan1-28/+27
ELF header is read into bprm->buf[] by generic execve code. Save a memcpy and allocate just one header for the interpreter instead of two headers (64 bytes instead of 128 on 64-bit). Link: http://lkml.kernel.org/r/20191208171242.GA19716@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: fix ->start_code calculationAlexey Dobriyan1-1/+1
Only executable segments should be accounted to ->start_code just like they do to ->end_code (correctly). Link: http://lkml.kernel.org/r/20191208171410.GB19716@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: smaller code generation around auxv vector fillAlexey Dobriyan1-7/+8
Filling auxv vector as array with index (auxv[i++] = ...) generates terrible code. "saved_auxv" should be reworked because it is the worst member of mm_struct by size/usefullness ratio but do it later. Meanwhile help gcc a little with *auxv++ idiom. Space savings on x86_64: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-127 (-127) Function old new delta load_elf_binary 5470 5343 -127 Link: http://lkml.kernel.org/r/20191208172301.GD19716@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31btrfs: use larger zlib buffer for s390 hardware compressionMikhail Zaslonko2-36/+101
In order to benefit from s390 zlib hardware compression support, increase the btrfs zlib workspace buffer size from 1 to 4 pages (if s390 zlib hardware support is enabled on the machine). This brings up to 60% better performance in hardware on s390 compared to the PAGE_SIZE buffer and much more compared to the software zlib processing in btrfs. In case of memory pressure, fall back to a single page buffer during workspace allocation. The data compressed with larger input buffers will still conform to zlib standard and thus can be decompressed also on a systems that uses only PAGE_SIZE buffer for btrfs zlib. Link: http://lkml.kernel.org/r/20200108105103.29028-1-zaslonko@linux.ibm.com Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Reviewed-by: David Sterba <dsterba@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: David Sterba <dsterba@suse.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Eduard Shishkin <edward6@linux.ibm.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31mm, tree-wide: rename put_user_page*() to unpin_user_page*()John Hubbard1-2/+2
In order to provide a clearer, more symmetric API for pinning and unpinning DMA pages. This way, pin_user_pages*() calls match up with unpin_user_pages*() calls, and the API is a lot closer to being self-explanatory. Link: http://lkml.kernel.org/r/20200107224558.2362728-23-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/io_uring: set FOLL_PIN via pin_user_pages()John Hubbard1-1/+1
Convert fs/io_uring to use the new pin_user_pages() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages, and therefore for any code that calls put_user_page(). In partial anticipation of this work, the io_uring code was already calling put_user_page() instead of put_page(). Therefore, in order to convert from the get_user_pages()/put_page() model, to the pin_user_pages()/put_user_page() model, the only change required here is to change get_user_pages() to pin_user_pages(). Link: http://lkml.kernel.org/r/20200107224558.2362728-17-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31ocfs2: use ocfs2_update_inode_fsync_trans() to access t_tid in ↵wangyan1-2/+1
handle->h_transaction For the uniform format, we use ocfs2_update_inode_fsync_trans() to access t_tid in handle->h_transaction Link: http://lkml.kernel.org/r/6ff9a312-5f7d-0e27-fb51-bc4e062fcd97@huawei.com Signed-off-by: Yan Wang <wangyan122@huawei.com> Reviewed-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans()wangyan1-3/+5
I found a NULL pointer dereference in ocfs2_update_inode_fsync_trans(), handle->h_transaction may be NULL in this situation: ocfs2_file_write_iter ->__generic_file_write_iter ->generic_perform_write ->ocfs2_write_begin ->ocfs2_write_begin_nolock ->ocfs2_write_cluster_by_desc ->ocfs2_write_cluster ->ocfs2_mark_extent_written ->ocfs2_change_extent_flag ->ocfs2_split_extent ->ocfs2_try_to_merge_extent ->ocfs2_extend_rotate_transaction ->ocfs2_extend_trans ->jbd2_journal_restart ->jbd2__journal_restart // handle->h_transaction is NULL here ->handle->h_transaction = NULL; ->start_this_handle /* journal aborted due to storage network disconnection, return error */ ->return -EROFS; /* line 3806 in ocfs2_try_to_merge_extent (), it will ignore ret error. */ ->ret = 0; ->... ->ocfs2_write_end ->ocfs2_write_end_nolock ->ocfs2_update_inode_fsync_trans // NULL pointer dereference ->oi->i_sync_tid = handle->h_transaction->t_tid; The information of NULL pointer dereference as follows: JBD2: Detected IO errors while flushing file data on dm-11-45 Aborting journal on device dm-11-45. JBD2: Error -5 detected when updating journal superblock for dm-11-45. (dd,22081,3):ocfs2_extend_trans:474 ERROR: status = -30 (dd,22081,3):ocfs2_try_to_merge_extent:3877 ERROR: status = -30 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000e74e1338 [0000000000000008] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] SMP Process dd (pid: 22081, stack limit = 0x00000000584f35a9) CPU: 3 PID: 22081 Comm: dd Kdump: loaded Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019 pstate: 60400009 (nZCv daif +PAN -UAO) pc : ocfs2_write_end_nolock+0x2b8/0x550 [ocfs2] lr : ocfs2_write_end_nolock+0x2a0/0x550 [ocfs2] sp : ffff0000459fba70 x29: ffff0000459fba70 x28: 0000000000000000 x27: ffff807ccf7f1000 x26: 0000000000000001 x25: ffff807bdff57970 x24: ffff807caf1d4000 x23: ffff807cc79e9000 x22: 0000000000001000 x21: 000000006c6cd000 x20: ffff0000091d9000 x19: ffff807ccb239db0 x18: ffffffffffffffff x17: 000000000000000e x16: 0000000000000007 x15: ffff807c5e15bd78 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000001 x9 : 0000000000000228 x8 : 000000000000000c x7 : 0000000000000fff x6 : ffff807a308ed6b0 x5 : ffff7e01f10967c0 x4 : 0000000000000018 x3 : d0bc661572445600 x2 : 0000000000000000 x1 : 000000001b2e0200 x0 : 0000000000000000 Call trace: ocfs2_write_end_nolock+0x2b8/0x550 [ocfs2] ocfs2_write_end+0x4c/0x80 [ocfs2] generic_perform_write+0x108/0x1a8 __generic_file_write_iter+0x158/0x1c8 ocfs2_file_write_iter+0x668/0x950 [ocfs2] __vfs_write+0x11c/0x190 vfs_write+0xac/0x1c0 ksys_write+0x6c/0xd8 __arm64_sys_write+0x24/0x30 el0_svc_common+0x78/0x130 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc To prevent NULL pointer dereference in this situation, we use is_handle_aborted() before using handle->h_transaction->t_tid. Link: http://lkml.kernel.org/r/03e750ab-9ade-83aa-b000-b9e81e34e539@huawei.com Signed-off-by: Yan Wang <wangyan122@huawei.com> Reviewed-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31ocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider useAndy Shevchenko1-4/+0
There are users already and will be more of BITS_TO_BYTES() macro. Move it to bitops.h for wider use. In the case of ocfs2 the replacement is identical. As for bnx2x, there are two places where floor version is used. In the first case to calculate the amount of structures that can fit one memory page. In this case obviously the ceiling variant is correct and original code might have a potential bug, if amount of bits % 8 is not 0. In the second case the macro is used to calculate bytes transmitted in one microsecond. This will work for all speeds which is multiply of 1Gbps without any change, for the rest new code will give ceiling value, for instance 100Mbps will give 13 bytes, while old code gives 12 bytes and the arithmetically correct one is 12.5 bytes. Further the value is used to setup timer threshold which in any case has its own margins due to certain resolution. I don't see here an issue with slightly shifting thresholds for low speed connections, the card is supposed to utilize highest available rate, which is usually 10Gbps. Link: http://lkml.kernel.org/r/20200108121316.22411-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31ocfs2/dlm: remove redundant assignment to retColin Ian King1-1/+1
The variable ret is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses Coverity ("Unused value") Link: http://lkml.kernel.org/r/20191202164833.62865-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31ocfs2: make local header paths relative to C filesMasahiro Yamada13-45/+41
Gang He reports the failure of building fs/ocfs2/ as an external module of the kernel installed on the system: $ cd fs/ocfs2 $ make -C /lib/modules/`uname -r`/build M=`pwd` modules If you want to make it work reliably, I'd recommend to remove ccflags-y from the Makefiles, and to make header paths relative to the C files. I think this is the correct usage of the #include "..." directive. Link: http://lkml.kernel.org/r/20191227022950.14804-1-ghe@suse.com Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Gang He <ghe@suse.com> Reported-by: Gang He <ghe@suse.com> Reviewed-by: Gang He <ghe@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31ocfs2: remove unneeded semicolonszhengbin2-2/+2
Fixes coccicheck warnings: fs/ocfs2/cluster/quorum.c:76:2-3: Unneeded semicolon fs/ocfs2/dlmglue.c:573:2-3: Unneeded semicolon Link: http://lkml.kernel.org/r/6ee3aa16-9078-30b1-df3f-22064950bd98@linux.alibaba.com Signed-off-by: zhengbin <zhengbin13@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs: ocfs: remove unnecessary assertion in dlm_migrate_lockresAditya Pakki1-2/+0
In the only caller of dlm_migrate_lockres() - dlm_empty_lockres(), target is checked for O2NM_MAX_NODES. Thus, the assertion in dlm_migrate_lockres() is unnecessary and can be removed. The patch eliminates such a check. Link: http://lkml.kernel.org/r/20191218194111.26041-1-pakki001@umn.edu Signed-off-by: Aditya Pakki <pakki001@umn.edu> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31memcg: fix a crash in wb_workfn when a device disappearsTheodore Ts'o1-1/+1
Without memcg, there is a one-to-one mapping between the bdi and bdi_writeback structures. In this world, things are fairly straightforward; the first thing bdi_unregister() does is to shutdown the bdi_writeback structure (or wb), and part of that writeback ensures that no other work queued against the wb, and that the wb is fully drained. With memcg, however, there is a one-to-many relationship between the bdi and bdi_writeback structures; that is, there are multiple wb objects which can all point to a single bdi. There is a refcount which prevents the bdi object from being released (and hence, unregistered). So in theory, the bdi_unregister() *should* only get called once its refcount goes to zero (bdi_put will drop the refcount, and when it is zero, release_bdi gets called, which calls bdi_unregister). Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about the Brave New memcg World, and calls bdi_unregister directly. It does this without informing the file system, or the memcg code, or anything else. This causes the root wb associated with the bdi to be unregistered, but none of the memcg-specific wb's are shutdown. So when one of these wb's are woken up to do delayed work, they try to dereference their wb->bdi->dev to fetch the device name, but unfortunately bdi->dev is now NULL, thanks to the bdi_unregister() called by del_gendisk(). As a result, *boom*. Fortunately, it looks like the rest of the writeback path is perfectly happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to create a bdi_dev_name() function which can handle bdi->dev being NULL. This also allows us to bulletproof the writeback tracepoints to prevent them from dereferencing a NULL pointer and crashing the kernel if one is tracing with memcg's enabled, and an iSCSI device dies or a USB storage stick is pulled. The most common way of triggering this will be hotremoval of a device while writeback with memcg enabled is going on. It was triggering several times a day in a heavily loaded production environment. Google Bug Id: 145475544 Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-30Merge tag 'mpx-for-linus' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx Pull x86 MPX removal from Dave Hansen: "MPX requires recompiling applications, which requires compiler support. Unfortunately, GCC 9.1 is expected to be be released without support for MPX. This means that there was only a relatively small window where folks could have ever used MPX. It failed to gain wide adoption in the industry, and Linux was the only mainstream OS to ever support it widely. Support for the feature may also disappear on future processors. This set completes the process that we started during the 5.4 merge window when the MPX prctl()s were removed. XSAVE support is left in place, which allows MPX-using KVM guests to continue to function" * tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx: x86/mpx: remove MPX from arch/x86 mm: remove arch_bprm_mm_init() hook x86/mpx: remove bounds exception code x86/mpx: remove build infrastructure x86/alternatives: add missing insn.h include
2020-01-30Merge tag 'upstream-5.6-rc1' of ↵Linus Torvalds5-7/+19
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI/UBIFS updates from Miquel Raynal: "This pull request contains mostly fixes for UBI and UBIFS: UBI: - Fixes for memory leaks in error paths - Fix for an logic error in a fastmap selfcheck UBIFS: - Fix for FS_IOC_SETFLAGS related to fscrypt flag - Support for FS_ENCRYPT_FL - Fix for a dead lock in bulk-read mode" Sent on behalf of Richard Weinberger who is traveling. * tag 'upstream-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: Fix an error pointer dereference in error handling code ubifs: Fix memory leak from c->sup_node ubifs: Fix ino_t format warnings in orphan_delete() ubifs: Fix deadlock in concurrent bulk-read and writepage ubifs: Fix wrong memory allocation ubi: Free the normal volumes in error paths of ubi_attach_mtd_dev() ubi: Check the presence of volume before call ubi_fastmap_destroy_checkmap() ubifs: Add support for FS_ENCRYPT_FL ubifs: Fix FS_IOC_SETFLAGS unexpectedly clearing encrypt flag ubi: wl: Remove set but not used variable 'prev_e' ubi: fastmap: Fix inverted logic in seen selfcheck
2020-01-30Merge tag 'f2fs-for-5.6' of ↵Linus Torvalds18-370/+3137
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this series, we've implemented transparent compression experimentally. It supports LZO and LZ4, but will add more later as we investigate in the field more. At this point, the feature doesn't expose compressed space to user directly in order to guarantee potential data updates later to the space. Instead, the main goal is to reduce data writes to flash disk as much as possible, resulting in extending disk life time as well as relaxing IO congestion. Alternatively, we're also considering to add ioctl() to reclaim compressed space and show it to user after putting the immutable bit. Enhancements: - add compression support - avoid unnecessary locks in quota ops - harden power-cut scenario for zoned block devices - use private bio_set to avoid IO congestion - replace GC mutex with rwsem to serialize callers Bug fixes: - fix dentry consistency and memory corruption in rename()'s error case - fix wrong swap extent reports - fix casefolding bugs - change lock coverage to avoid deadlock - avoid GFP_KERNEL under f2fs_lock_op And, we've cleaned up sysfs entries to prepare no debugfs" * tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (31 commits) f2fs: fix race conditions in ->d_compare() and ->d_hash() f2fs: fix dcache lookup of !casefolded directories f2fs: Add f2fs stats to sysfs f2fs: delete duplicate information on sysfs nodes f2fs: change to use rwsem for gc_mutex f2fs: update f2fs document regarding to fsync_mode f2fs: add a way to turn off ipu bio cache f2fs: code cleanup for f2fs_statfs_project() f2fs: fix miscounted block limit in f2fs_statfs_project() f2fs: show the CP_PAUSE reason in checkpoint traces f2fs: fix deadlock allocating bio_post_read_ctx from mempool f2fs: remove unneeded check for error allocating bio_post_read_ctx f2fs: convert inline_dir early before starting rename f2fs: fix memleak of kobject f2fs: fix to add swap extent correctly f2fs: run fsck when getting bad inode during GC f2fs: support data compression f2fs: free sysfs kobject f2fs: declare nested quota_sem and remove unnecessary sems f2fs: don't put new_page twice in f2fs_rename ...
2020-01-30Merge tag 'for_v5.6-rc1' of ↵Linus Torvalds12-85/+137
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF, quota, reiserfs, ext2 fixes and cleanups from Jan Kara: "A few assorted fixes and cleanups for udf, quota, reiserfs, and ext2" * tag 'for_v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fs/reiserfs: remove unused macros fs/quota: remove unused macro udf: Clarify meaning of f_files in udf_statfs udf: Allow writing to 'Rewritable' partitions udf: Disallow R/W mode for disk with Metadata partition udf: Fix meaning of ENTITYID_FLAGS_* macros to be really bitwise-or flags udf: Fix free space reporting for metadata and virtual partitions udf: Update header files to UDF 2.60 udf: Move OSTA Identifier Suffix macros from ecma_167.h to osta_udf.h udf: Fix spelling in EXT_NEXT_EXTENT_ALLOCDESCS ext2: Adjust indentation in ext2_fill_super quota: avoid time_t in v1_disk_dqblk definition reiserfs: Fix spurious unlock in reiserfs_fill_super() error handling reiserfs: Fix memory leak of journal device string ext2: set proper errno in error case of ext2_fill_super()
2020-01-30Merge tag 'xfs-5.6-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds33-253/+300
Pull xfs updates from Darrick Wong: "In this release we clean out the last of the old 32-bit timestamp code, fix a number of bugs and memory corruptions on 32-bit platforms, and a refactoring of some of the extended attribute code. I think I'll be back next week with some refactoring of how the XFS buffer code returns error codes, however I prefer to hold onto that for another week to let it soak a while longer Summary: - Get rid of compat_time_t - Convert time_t to time64_t in quota code - Remove shadow variables - Prevent ATTR_ flag misuse in the attrmulti ioctls - Clean out strlen in the attr code - Remove some bogus asserts - Fix various file size limit calculation errors with 32-bit kernels - Pack xfs_dir2_sf_entry_t to fix build errors on arm oabi - Fix nowait inode locking calls for directio aio reads - Fix memory corruption bugs when invalidating remote xattr value buffers - Streamline remote attr value removal - Make the buffer log format size consistent across platforms - Strengthen buffer log format size checking - Fix messed up return types of xfs_inode_need_cow - Fix some unused variable warnings" * tag 'xfs-5.6-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (24 commits) xfs: remove unused variable 'done' xfs: fix uninitialized variable in xfs_attr3_leaf_inactive xfs: change return value of xfs_inode_need_cow to int xfs: check log iovec size to make sure it's plausibly a buffer log format xfs: make struct xfs_buf_log_format have a consistent size xfs: complain if anyone tries to create a too-large buffer log item xfs: clean up xfs_buf_item_get_format return value xfs: streamline xfs_attr3_leaf_inactive xfs: fix memory corruption during remote attr value buffer invalidation xfs: refactor remote attr value buffer invalidation xfs: fix IOCB_NOWAIT handling in xfs_file_dio_aio_read xfs: Add __packed to xfs_dir2_sf_entry_t definition xfs: fix s_maxbytes computation on 32-bit kernels xfs: truncate should remove all blocks, not just to the end of the page cache xfs: introduce XFS_MAX_FILEOFF xfs: remove bogus assertion when online repair isn't enabled xfs: Remove all strlen in all xfs_attr_* functions for attr names. xfs: fix misuse of the XFS_ATTR_INCOMPLETE flag xfs: also remove cached ACLs when removing the underlying attr xfs: reject invalid flags combinations in XFS_IOC_ATTRMULTI_BY_HANDLE ...
2020-01-30Merge tag 'ext4_for_linus' of ↵Linus Torvalds28-423/+682
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "This merge window, we've added some performance improvements in how we handle inode locking in the read/write paths, and improving the performance of Direct I/O overwrites. We also now record the error code which caused the first and most recent ext4_error() report in the superblock, to make it easier to root cause problems in production systems. There are also many of the usual cleanups and miscellaneous bug fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (49 commits) jbd2: clean __jbd2_journal_abort_hard() and __journal_abort_soft() jbd2: make sure ESHUTDOWN to be recorded in the journal superblock ext4, jbd2: ensure panic when aborting with zero errno jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record jbd2_seq_info_next should increase position index jbd2: remove pointless assertion in __journal_remove_journal_head ext4,jbd2: fix comment and code style jbd2: delete the duplicated words in the comments ext4: fix extent_status trace points ext4: fix symbolic enum printing in trace output ext4: choose hardlimit when softlimit is larger than hardlimit in ext4_statfs_project() ext4: fix race conditions in ->d_compare() and ->d_hash() ext4: make dioread_nolock the default ext4: fix extent_status fragmentation for plain files jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal ext4: drop ext4_kvmalloc() ext4: Add EXT4_IOC_FSGETXATTR/EXT4_IOC_FSSETXATTR to compat_ioctl ext4: remove unused macro MPAGE_DA_EXTENT_TAIL ext4: add missing braces in ext4_ext_drop_refs() ext4: fix some nonstandard indentation in extents.c ...
2020-01-29Merge tag 'threads-v5.6' of ↵Linus Torvalds1-2/+20
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull thread management updates from Christian Brauner: "Sargun Dhillon over the last cycle has worked on the pidfd_getfd() syscall. This syscall allows for the retrieval of file descriptors of a process based on its pidfd. A task needs to have ptrace_may_access() permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and Andy) on the target. One of the main use-cases is in combination with seccomp's user notification feature. As a reminder, seccomp's user notification feature was made available in v5.0. It allows a task to retrieve a file descriptor for its seccomp filter. The file descriptor is usually handed of to a more privileged supervising process. The supervisor can then listen for syscall events caught by the seccomp filter of the supervisee and perform actions in lieu of the supervisee, usually emulating syscalls. pidfd_getfd() is needed to expand its uses. There are currently two major users that wait on pidfd_getfd() and one future user: - Netflix, Sargun said, is working on a service mesh where users should be able to connect to a dns-based VIP. When a user connects to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be redirected to an envoy process. This service mesh uses seccomp user notifications and pidfd to intercept all connect calls and instead of connecting them to 1.2.3.4:80 connects them to e.g. 127.0.0.1:8080. - LXD uses the seccomp notifier heavily to intercept and emulate mknod() and mount() syscalls for unprivileged containers/processes. With pidfd_getfd() more uses-cases e.g. bridging socket connections will be possible. - The patchset has also seen some interest from the browser corner. Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a broker process. In the future glibc will start blocking all signals during dlopen() rendering this type of sandbox impossible. Hence, in the future Firefox will switch to a seccomp-user-nofication based sandbox which also makes use of file descriptor retrieval. The thread for this can be found at https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html With pidfd_getfd() it is e.g. possible to bridge socket connections for the supervisee (binding to a privileged port) and taking actions on file descriptors on behalf of the supervisee in general. Sargun's first version was using an ioctl on pidfds but various people pushed for it to be a proper syscall which he duely implemented as well over various review cycles. Selftests are of course included. I've also added instructions how to deal with merge conflicts below. There's also a small fix coming from the kernel mentee project to correctly annotate struct sighand_struct with __rcu to fix various sparse warnings. We've received a few more such fixes and even though they are mostly trivial I've decided to postpone them until after -rc1 since they came in rather late and I don't want to risk introducing build warnings. Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is needed to avoid allocation recursions triggerable by storage drivers that have userspace parts that run in the IO path (e.g. dm-multipath, iscsi, etc). These allocation recursions deadlock the device. The new prctl() allows such privileged userspace components to avoid allocation recursions by setting the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE flags. The patch carries the necessary acks from the relevant maintainers and is routed here as part of prctl() thread-management." * tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim sched.h: Annotate sighand_struct with __rcu test: Add test for pidfd getfd arch: wire up pidfd_getfd syscall pid: Implement pidfd_getfd syscall vfs, fdtable: Add fget_task helper
2020-01-29Merge tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-blockLinus Torvalds8-459/+1995
Pull io_uring updates from Jens Axboe: - Support for various new opcodes (fallocate, openat, close, statx, fadvise, madvise, openat2, non-vectored read/write, send/recv, and epoll_ctl) - Faster ring quiesce for fileset updates - Optimizations for overflow condition checking - Support for max-sized clamping - Support for probing what opcodes are supported - Support for io-wq backend sharing between "sibling" rings - Support for registering personalities - Lots of little fixes and improvements * tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-block: (64 commits) io_uring: add support for epoll_ctl(2) eventpoll: support non-blocking do_epoll_ctl() calls eventpoll: abstract out epoll_ctl() handler io_uring: fix linked command file table usage io_uring: support using a registered personality for commands io_uring: allow registering credentials io_uring: add io-wq workqueue sharing io-wq: allow grabbing existing io-wq io_uring/io-wq: don't use static creds/mm assignments io-wq: make the io_wq ref counted io_uring: fix refcounting with batched allocations at OOM io_uring: add comment for drain_next io_uring: don't attempt to copy iovec for READ/WRITE io_uring: honor IOSQE_ASYNC for linked reqs io_uring: prep req when do IOSQE_ASYNC io_uring: use labeled array init in io_op_defs io_uring: optimise sqe-to-req flags translation io_uring: remove REQ_F_IO_DRAINED io_uring: file switch work needs to get flushed on exit io_uring: hide uring_fd in ctx ...
2020-01-29Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds4-303/+97
Pull SCSI updates from James Bottomley: "This series is slightly unusual because it includes Arnd's compat ioctl tree here: 1c46a2cf2dbd Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue Excluding Arnd's changes, this is mostly an update of the usual drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas. There are a couple of core and base updates around error propagation and atomicity in the attribute container base we use for the SCSI transport classes. The rest is minor changes and updates" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (149 commits) scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask scsi: hisi_sas: Add prints for v3 hw interrupt converge and automatic affinity scsi: hisi_sas: Modify the file permissions of trigger_dump to write only scsi: hisi_sas: Replace magic number when handle channel interrupt scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock scsi: hisi_sas: use threaded irq to process CQ interrupts scsi: ufs: Use UFS device indicated maximum LU number scsi: ufs: Add max_lu_supported in struct ufs_dev_info scsi: ufs: Delete is_init_prefetch from struct ufs_hba scsi: ufs: Inline two functions into their callers scsi: ufs: Move ufshcd_get_max_pwr_mode() to ufshcd_device_params_init() scsi: ufs: Split ufshcd_probe_hba() based on its called flow scsi: ufs: Delete struct ufs_dev_desc scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails scsi: ufs-mediatek: enable low-power mode for hibern8 state scsi: ufs: export some functions for vendor usage scsi: ufs-mediatek: add dbg_register_dump implementation scsi: qla2xxx: Fix a NULL pointer dereference in an error path scsi: qla1280: Make checking for 64bit support consistent scsi: megaraid_sas: Update driver version to 07.713.01.00-rc1 ...
2020-01-29Merge tag 'linux-kselftest-5.6-rc1-kunit' of ↵Linus Torvalds3-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest kunit updates from Shuah Khan: "This kunit update consists of: - Support for building kunit as a module from Alan Maguire - AppArmor KUnit tests for policy unpack from Mike Salvatore" * tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: building kunit as a module breaks allmodconfig kunit: update documentation to describe module-based build kunit: allow kunit to be loaded as a module kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds kunit: allow kunit tests to be loaded as a module kunit: hide unexported try-catch interface in try-catch-impl.h kunit: move string-stream.h to lib/kunit apparmor: add AppArmor KUnit tests for policy unpack
2020-01-29Merge tag 'y2038-drivers-for-v5.6-signed' of ↵Linus Torvalds12-54/+106
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull y2038 updates from Arnd Bergmann: "Core, driver and file system changes These are updates to device drivers and file systems that for some reason or another were not included in the kernel in the previous y2038 series. I've gone through all users of time_t again to make sure the kernel is in a long-term maintainable state, replacing all remaining references to time_t with safe alternatives. Some related parts of the series were picked up into the nfsd, xfs, alsa and v4l2 trees. A final set of patches in linux-mm removes the now unused time_t/timeval/timespec types and helper functions after all five branches are merged for linux-5.6, ensuring that no new users get merged. As a result, linux-5.6, or my backport of the patches to 5.4 [1], should be the first release that can serve as a base for a 32-bit system designed to run beyond year 2038, with a few remaining caveats: - All user space must be compiled with a 64-bit time_t, which will be supported in the coming musl-1.2 and glibc-2.32 releases, along with installed kernel headers from linux-5.6 or higher. - Applications that use the system call interfaces directly need to be ported to use the time64 syscalls added in linux-5.1 in place of the existing system calls. This impacts most users of futex() and seccomp() as well as programming languages that have their own runtime environment not based on libc. - Applications that use a private copy of kernel uapi header files or their contents may need to update to the linux-5.6 version, in particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h, linux/elfcore.h, linux/sockios.h, linux/timex.h and linux/can/bcm.h. - A few remaining interfaces cannot be changed to pass a 64-bit time_t in a compatible way, so they must be configured to use CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit timestamps. Most importantly this impacts all users of 'struct input_event'. - All y2038 problems that are present on 64-bit machines also apply to 32-bit machines. In particular this affects file systems with on-disk timestamps using signed 32-bit seconds: ext4 with ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs" [1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame * tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (21 commits) Revert "drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC" y2038: sh: remove timeval/timespec usage from headers y2038: sparc: remove use of struct timex y2038: rename itimerval to __kernel_old_itimerval y2038: remove obsolete jiffies conversion functions nfs: fscache: use timespec64 in inode auxdata nfs: fix timstamp debug prints nfs: use time64_t internally sunrpc: convert to time64_t for expiry drm/etnaviv: avoid deprecated timespec drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC drm/msm: avoid using 'timespec' hfs/hfsplus: use 64-bit inode timestamps hostfs: pass 64-bit timestamps to/from user space packet: clarify timestamp overflow tsacct: add 64-bit btime field acct: stop using get_seconds() um: ubd: use 64-bit time_t where possible xtensa: ISS: avoid struct timeval dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD ...
2020-01-29io_uring: add support for epoll_ctl(2)Jens Axboe1-0/+71
This adds IORING_OP_EPOLL_CTL, which can perform the same work as the epoll_ctl(2) system call. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29eventpoll: support non-blocking do_epoll_ctl() callsJens Axboe1-13/+33
Also make it available outside of epoll, along with the helper that decides if we need to copy the passed in epoll_event. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29eventpoll: abstract out epoll_ctl() handlerJens Axboe1-20/+25
No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29io_uring: fix linked command file table usageJens Axboe3-14/+21
We're not consistent in how the file table is grabbed and assigned if we have a command linked that requires the use of it. Add ->file_table to the io_op_defs[] array, and use that to determine when to grab the table instead of having the handlers set it if they need to defer. This also means we can kill the IO_WQ_WORK_NEEDS_FILES flag. We always initialize work->files, so io-wq can just check for that. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29Merge tag 'erofs-for-5.6-rc1' of ↵Linus Torvalds5-107/+74
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "A regression fix, several cleanups and (maybe) plus an upcoming new mount api convert patch as a part of vfs update are considered available for this cycle. All commits have been in linux-next and tested with no smoke out. Summary: - fix an out-of-bound read access introduced in v5.3, which could rarely cause data corruption - various cleanup patches" * tag 'erofs-for-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: clean up z_erofs_submit_queue() erofs: fold in postsubmit_is_all_bypassed() erofs: fix out-of-bound read for shifted uncompressed block erofs: remove void tagging/untagging of workgroup pointers erofs: remove unused tag argument while registering a workgroup erofs: remove unused tag argument while finding a workgroup erofs: correct indentation of an assigned structure inside a function
2020-01-29Merge branch 'work.adfs' of ↵Linus Torvalds9-740/+890
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull adfs updates from Al Viro: "adfs stuff for this cycle" * 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits) fs/adfs: bigdir: Fix an error code in adfs_fplus_read() Documentation: update adfs filesystem documentation fs/adfs: mostly divorse inode number from indirect disc address fs/adfs: super: add support for E and E+ floppy image formats fs/adfs: super: extract filesystem block probe fs/adfs: dir: remove debug in adfs_dir_update() fs/adfs: super: fix inode dropping fs/adfs: bigdir: implement directory update support fs/adfs: bigdir: calculate and validate directory checkbyte fs/adfs: bigdir: directory validation strengthening fs/adfs: bigdir: extract directory validation fs/adfs: bigdir: factor out directory entry offset calculation fs/adfs: newdir: split out directory commit from update fs/adfs: newdir: clean up adfs_f_update() fs/adfs: newdir: merge adfs_dir_read() into adfs_f_read() fs/adfs: newdir: improve directory validation fs/adfs: newdir: factor out directory format validation fs/adfs: dir: use pointers to access directory head/tails fs/adfs: dir: add more efficient iterate() per-format method fs/adfs: dir: switch to iterate_shared method ...
2020-01-29Merge branch 'work.openat2' of ↵Linus Torvalds5-93/+305
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull openat2 support from Al Viro: "This is the openat2() series from Aleksa Sarai. I'm afraid that the rest of namei stuff will have to wait - it got zero review the last time I'd posted #work.namei, and there had been a leak in the posted series I'd caught only last weekend. I was going to repost it on Monday, but the window opened and the odds of getting any review during that... Oh, well. Anyway, openat2 part should be ready; that _did_ get sane amount of review and public testing, so here it comes" From Aleksa's description of the series: "For a very long time, extending openat(2) with new features has been incredibly frustrating. This stems from the fact that openat(2) is possibly the most famous counter-example to the mantra "don't silently accept garbage from userspace" -- it doesn't check whether unknown flags are present[1]. This means that (generally) the addition of new flags to openat(2) has been fraught with backwards-compatibility issues (O_TMPFILE has to be defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old kernels gave errors, since it's insecure to silently ignore the flag[2]). All new security-related flags therefore have a tough road to being added to openat(2). Furthermore, the need for some sort of control over VFS's path resolution (to avoid malicious paths resulting in inadvertent breakouts) has been a very long-standing desire of many userspace applications. This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset (which was a variant of David Drysdale's O_BENEATH patchset[4] which was a spin-off of the Capsicum project[5]) with a few additions and changes made based on the previous discussion within [6] as well as others I felt were useful. In line with the conclusions of the original discussion of AT_NO_JUMPS, the flag has been split up into separate flags. However, instead of being an openat(2) flag it is provided through a new syscall openat2(2) which provides several other improvements to the openat(2) interface (see the patch description for more details). The following new LOOKUP_* flags are added: LOOKUP_NO_XDEV: Blocks all mountpoint crossings (upwards, downwards, or through absolute links). Absolute pathnames alone in openat(2) do not trigger this. Magic-link traversal which implies a vfsmount jump is also blocked (though magic-link jumps on the same vfsmount are permitted). LOOKUP_NO_MAGICLINKS: Blocks resolution through /proc/$pid/fd-style links. This is done by blocking the usage of nd_jump_link() during resolution in a filesystem. The term "magic-links" is used to match with the only reference to these links in Documentation/, but I'm happy to change the name. It should be noted that this is different to the scope of ~LOOKUP_FOLLOW in that it applies to all path components. However, you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it will *not* fail (assuming that no parent component was a magic-link), and you will have an fd for the magic-link. In order to correctly detect magic-links, the introduction of a new LOOKUP_MAGICLINK_JUMPED state flag was required. LOOKUP_BENEATH: Disallows escapes to outside the starting dirfd's tree, using techniques such as ".." or absolute links. Absolute paths in openat(2) are also disallowed. Conceptually this flag is to ensure you "stay below" a certain point in the filesystem tree -- but this requires some additional to protect against various races that would allow escape using "..". Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it can trivially beam you around the filesystem (breaking the protection). In future, there might be similar safety checks done as in LOOKUP_IN_ROOT, but that requires more discussion. In addition, two new flags are added that expand on the above ideas: LOOKUP_NO_SYMLINKS: Does what it says on the tin. No symlink resolution is allowed at all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this can still be used with NOFOLLOW to open an fd for the symlink as long as no parent path had a symlink component. LOOKUP_IN_ROOT: This is an extension of LOOKUP_BENEATH that, rather than blocking attempts to move past the root, forces all such movements to be scoped to the starting point. This provides chroot(2)-like protection but without the cost of a chroot(2) for each filesystem operation, as well as being safe against race attacks that chroot(2) is not. If a race is detected (as with LOOKUP_BENEATH) then an error is generated, and similar to LOOKUP_BENEATH it is not permitted to cross magic-links with LOOKUP_IN_ROOT. The primary need for this is from container runtimes, which currently need to do symlink scoping in userspace[7] when opening paths in a potentially malicious container. There is a long list of CVEs that could have bene mitigated by having RESOLVE_THIS_ROOT (such as CVE-2017-1002101, CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a few). In order to make all of the above more usable, I'm working on libpathrs[8] which is a C-friendly library for safe path resolution. It features a userspace-emulated backend if the kernel doesn't support openat2(2). Hopefully we can get userspace to switch to using it, and thus get openat2(2) support for free once it's ready. Future work would include implementing things like RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow programs to be sure they don't hit DoSes though stale NFS handles)" * 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Documentation: path-lookup: include new LOOKUP flags selftests: add openat2(2) selftests open: introduce openat2(2) syscall namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution namei: LOOKUP_IN_ROOT: chroot-like scoped resolution namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution namei: LOOKUP_NO_XDEV: block mountpoint crossing namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution namei: LOOKUP_NO_SYMLINKS: block symlink resolution namei: allow set_root() to produce errors namei: allow nd_jump_link() to produce errors nsfs: clean-up ns_get_path() signature to return int namei: only return -ECHILD from follow_dotdot_rcu()
2020-01-29Merge tag 'driver-core-5.6-rc1' of ↵Linus Torvalds3-24/+25
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is a small set of changes for 5.6-rc1 for the driver core and some firmware subsystem changes. Included in here are: - device.h splitup like you asked for months ago - devtmpfs minor cleanups - firmware core minor changes - debugfs fix for lockdown mode - kernfs cleanup fix - cpu topology minor fix All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (22 commits) firmware: Rename FW_OPT_NOFALLBACK to FW_OPT_NOFALLBACK_SYSFS devtmpfs: factor out common tail of devtmpfs_{create,delete}_node devtmpfs: initify a bit devtmpfs: simplify initialization of mount_dev devtmpfs: factor out setup part of devtmpfsd() devtmpfs: fix theoretical stale pointer deref in devtmpfsd() driver core: platform: fix u32 greater or equal to zero comparison cpu-topology: Don't error on more than CONFIG_NR_CPUS CPUs in device tree debugfs: Return -EPERM when locked down driver core: Print device when resources present in really_probe() driver core: Fix test_async_driver_probe if NUMA is disabled driver core: platform: Prevent resouce overflow from causing infinite loops fs/kernfs/dir.c: Clean code by removing always true condition component: do not dereference opaque pointer in debugfs drivers/component: remove modular code debugfs: Fix warnings when building documentation device.h: move 'struct driver' stuff out to device/driver.h device.h: move 'struct class' stuff out to device/class.h device.h: move 'struct bus' stuff out to device/bus.h device.h: move dev_printk()-like functions to dev_printk.h ...
2020-01-28io_uring: support using a registered personality for commandsJens Axboe1-1/+19
For personalities previously registered via IORING_REGISTER_PERSONALITY, allow any command to select them. This is done through setting sqe->personality to the id returned from registration, and then flagging sqe->flags with IOSQE_PERSONALITY. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-28io_uring: allow registering credentialsJens Axboe1-7/+68
If an application wants to use a ring with different kinds of credentials, it can register them upfront. We don't lookup credentials, the credentials of the task calling IORING_REGISTER_PERSONALITY is used. An 'id' is returned for the application to use in subsequent personality support. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-28io_uring: add io-wq workqueue sharingPavel Begunkov1-14/+50
If IORING_SETUP_ATTACH_WQ is set, it expects wq_fd in io_uring_params to be a valid io_uring fd io-wq of which will be shared with the newly created io_uring instance. If the flag is set but it can't share io-wq, it fails. This allows creation of "sibling" io_urings, where we prefer to keep the SQ/CQ private, but want to share the async backend to minimize the amount of overhead associated with having multiple rings that belong to the same backend. Reported-by: Jens Axboe <axboe@kernel.dk> Reported-by: Daurnimator <quae@daurnimator.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-28io-wq: allow grabbing existing io-wqPavel Begunkov2-0/+9
Export a helper to attach to an existing io-wq, rather than setting up a new one. This is doable now that we have reference counted io_wq's. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-28io_uring/io-wq: don't use static creds/mm assignmentsJens Axboe3-30/+81
We currently setup the io_wq with a static set of mm and creds. Even for a single-use io-wq per io_uring, this is suboptimal as we have may have multiple enters of the ring. For sharing the io-wq backend, it doesn't work at all. Switch to passing in the creds and mm when the work item is setup. This means that async work is no longer deferred to the io_uring mm and creds, it is done with the current mm and creds. Flag this behavior with IORING_FEAT_CUR_PERSONALITY, so applications know they can rely on the current personality (mm and creds) being the same for direct issue and async issue. Reviewed-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-28Merge branch 'linus' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Removed CRYPTO_TFM_RES flags - Extended spawn grabbing to all algorithm types - Moved hash descsize verification into API code Algorithms: - Fixed recursive pcrypt dead-lock - Added new 32 and 64-bit generic versions of poly1305 - Added cryptogams implementation of x86/poly1305 Drivers: - Added support for i.MX8M Mini in caam - Added support for i.MX8M Nano in caam - Added support for i.MX8M Plus in caam - Added support for A33 variant of SS in sun4i-ss - Added TEE support for Raven Ridge in ccp - Added in-kernel API to submit TEE commands in ccp - Added AMD-TEE driver - Added support for BCM2711 in iproc-rng200 - Added support for AES256-GCM based ciphers for chtls - Added aead support on SEC2 in hisilicon" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (244 commits) crypto: arm/chacha - fix build failured when kernel mode NEON is disabled crypto: caam - add support for i.MX8M Plus crypto: x86/poly1305 - emit does base conversion itself crypto: hisilicon - fix spelling mistake "disgest" -> "digest" crypto: chacha20poly1305 - add back missing test vectors and test chunking crypto: x86/poly1305 - fix .gitignore typo tee: fix memory allocation failure checks on drv_data and amdtee crypto: ccree - erase unneeded inline funcs crypto: ccree - make cc_pm_put_suspend() void crypto: ccree - split overloaded usage of irq field crypto: ccree - fix PM race condition crypto: ccree - fix FDE descriptor sequence crypto: ccree - cc_do_send_request() is void func crypto: ccree - fix pm wrongful error reporting crypto: ccree - turn errors to debug msgs crypto: ccree - fix AEAD decrypt auth fail crypto: ccree - fix typo in comment crypto: ccree - fix typos in error msgs crypto: atmel-{aes,sha,tdes} - Retire crypto_platform_data crypto: x86/sha - Eliminate casts on asm implementations ...
2020-01-28Merge tag '5.6-smb3-fixes-and-dfs-and-readdir-improvements' of ↵Linus Torvalds18-713/+1041
git://git.samba.org/sfrench/cifs-2.6 Pull cifs updates from Steve French: "Various SMB3/CIFS fixes including four for stable. - Improvement to fallocate (enables 3 additional xfstests) - Fix for file creation when mounting with modefromsid - Add ability to backup/restore dos attributes and creation time - DFS failover and reconnect fixes - performance optimization for readir Note that due to the upcoming SMB3 Test Event (at SNIA SDC next week) there will likely be more changesets near the end of the merge window (since we will be testing heavily next week, I held off on some patches and I expect some additional multichannel patches as well as patches to enable some additional xfstests)" * tag '5.6-smb3-fixes-and-dfs-and-readdir-improvements' of git://git.samba.org/sfrench/cifs-2.6: (24 commits) CIFS: Fix task struct use-after-free on reconnect cifs: use PTR_ERR_OR_ZERO() to simplify code cifs: add support for fallocate mode 0 for non-sparse files cifs: fix NULL dereference in match_prepath smb3: fix default permissions on new files when mounting with modefromsid CIFS: Add support for setting owner info, dos attributes, and create time cifs: remove set but not used variable 'server' cifs: Fix memory allocation in __smb2_handle_cancelled_cmd() cifs: Fix mount options set in automount cifs: fix unitialized variable poential problem with network I/O cache lock patch cifs: Fix return value in __update_cache_entry cifs: Avoid doing network I/O while holding cache lock cifs: Fix potential deadlock when updating vol in cifs_reconnect() cifs: Merge is_path_valid() into get_normalized_path() cifs: Introduce helpers for finding TCP connection cifs: Get rid of kstrdup_const()'d paths cifs: Clean up DFS referral cache cifs: Don't use iov_iter::type directly cifs: set correct max-buffer-size for smb2_ioctl_init() cifs: use compounding for open and first query-dir for readdir() ...