summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2020-01-31memcg: fix a crash in wb_workfn when a device disappearsTheodore Ts'o1-1/+1
Without memcg, there is a one-to-one mapping between the bdi and bdi_writeback structures. In this world, things are fairly straightforward; the first thing bdi_unregister() does is to shutdown the bdi_writeback structure (or wb), and part of that writeback ensures that no other work queued against the wb, and that the wb is fully drained. With memcg, however, there is a one-to-many relationship between the bdi and bdi_writeback structures; that is, there are multiple wb objects which can all point to a single bdi. There is a refcount which prevents the bdi object from being released (and hence, unregistered). So in theory, the bdi_unregister() *should* only get called once its refcount goes to zero (bdi_put will drop the refcount, and when it is zero, release_bdi gets called, which calls bdi_unregister). Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about the Brave New memcg World, and calls bdi_unregister directly. It does this without informing the file system, or the memcg code, or anything else. This causes the root wb associated with the bdi to be unregistered, but none of the memcg-specific wb's are shutdown. So when one of these wb's are woken up to do delayed work, they try to dereference their wb->bdi->dev to fetch the device name, but unfortunately bdi->dev is now NULL, thanks to the bdi_unregister() called by del_gendisk(). As a result, *boom*. Fortunately, it looks like the rest of the writeback path is perfectly happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to create a bdi_dev_name() function which can handle bdi->dev being NULL. This also allows us to bulletproof the writeback tracepoints to prevent them from dereferencing a NULL pointer and crashing the kernel if one is tracing with memcg's enabled, and an iSCSI device dies or a USB storage stick is pulled. The most common way of triggering this will be hotremoval of a device while writeback with memcg enabled is going on. It was triggering several times a day in a heavily loaded production environment. Google Bug Id: 145475544 Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-29Merge tag 'threads-v5.6' of ↵Linus Torvalds1-2/+20
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull thread management updates from Christian Brauner: "Sargun Dhillon over the last cycle has worked on the pidfd_getfd() syscall. This syscall allows for the retrieval of file descriptors of a process based on its pidfd. A task needs to have ptrace_may_access() permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and Andy) on the target. One of the main use-cases is in combination with seccomp's user notification feature. As a reminder, seccomp's user notification feature was made available in v5.0. It allows a task to retrieve a file descriptor for its seccomp filter. The file descriptor is usually handed of to a more privileged supervising process. The supervisor can then listen for syscall events caught by the seccomp filter of the supervisee and perform actions in lieu of the supervisee, usually emulating syscalls. pidfd_getfd() is needed to expand its uses. There are currently two major users that wait on pidfd_getfd() and one future user: - Netflix, Sargun said, is working on a service mesh where users should be able to connect to a dns-based VIP. When a user connects to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be redirected to an envoy process. This service mesh uses seccomp user notifications and pidfd to intercept all connect calls and instead of connecting them to 1.2.3.4:80 connects them to e.g. 127.0.0.1:8080. - LXD uses the seccomp notifier heavily to intercept and emulate mknod() and mount() syscalls for unprivileged containers/processes. With pidfd_getfd() more uses-cases e.g. bridging socket connections will be possible. - The patchset has also seen some interest from the browser corner. Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a broker process. In the future glibc will start blocking all signals during dlopen() rendering this type of sandbox impossible. Hence, in the future Firefox will switch to a seccomp-user-nofication based sandbox which also makes use of file descriptor retrieval. The thread for this can be found at https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html With pidfd_getfd() it is e.g. possible to bridge socket connections for the supervisee (binding to a privileged port) and taking actions on file descriptors on behalf of the supervisee in general. Sargun's first version was using an ioctl on pidfds but various people pushed for it to be a proper syscall which he duely implemented as well over various review cycles. Selftests are of course included. I've also added instructions how to deal with merge conflicts below. There's also a small fix coming from the kernel mentee project to correctly annotate struct sighand_struct with __rcu to fix various sparse warnings. We've received a few more such fixes and even though they are mostly trivial I've decided to postpone them until after -rc1 since they came in rather late and I don't want to risk introducing build warnings. Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is needed to avoid allocation recursions triggerable by storage drivers that have userspace parts that run in the IO path (e.g. dm-multipath, iscsi, etc). These allocation recursions deadlock the device. The new prctl() allows such privileged userspace components to avoid allocation recursions by setting the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE flags. The patch carries the necessary acks from the relevant maintainers and is routed here as part of prctl() thread-management." * tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim sched.h: Annotate sighand_struct with __rcu test: Add test for pidfd getfd arch: wire up pidfd_getfd syscall pid: Implement pidfd_getfd syscall vfs, fdtable: Add fget_task helper
2020-01-29Merge tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-blockLinus Torvalds8-459/+1995
Pull io_uring updates from Jens Axboe: - Support for various new opcodes (fallocate, openat, close, statx, fadvise, madvise, openat2, non-vectored read/write, send/recv, and epoll_ctl) - Faster ring quiesce for fileset updates - Optimizations for overflow condition checking - Support for max-sized clamping - Support for probing what opcodes are supported - Support for io-wq backend sharing between "sibling" rings - Support for registering personalities - Lots of little fixes and improvements * tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-block: (64 commits) io_uring: add support for epoll_ctl(2) eventpoll: support non-blocking do_epoll_ctl() calls eventpoll: abstract out epoll_ctl() handler io_uring: fix linked command file table usage io_uring: support using a registered personality for commands io_uring: allow registering credentials io_uring: add io-wq workqueue sharing io-wq: allow grabbing existing io-wq io_uring/io-wq: don't use static creds/mm assignments io-wq: make the io_wq ref counted io_uring: fix refcounting with batched allocations at OOM io_uring: add comment for drain_next io_uring: don't attempt to copy iovec for READ/WRITE io_uring: honor IOSQE_ASYNC for linked reqs io_uring: prep req when do IOSQE_ASYNC io_uring: use labeled array init in io_op_defs io_uring: optimise sqe-to-req flags translation io_uring: remove REQ_F_IO_DRAINED io_uring: file switch work needs to get flushed on exit io_uring: hide uring_fd in ctx ...
2020-01-29Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds4-303/+97
Pull SCSI updates from James Bottomley: "This series is slightly unusual because it includes Arnd's compat ioctl tree here: 1c46a2cf2dbd Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue Excluding Arnd's changes, this is mostly an update of the usual drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas. There are a couple of core and base updates around error propagation and atomicity in the attribute container base we use for the SCSI transport classes. The rest is minor changes and updates" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (149 commits) scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask scsi: hisi_sas: Add prints for v3 hw interrupt converge and automatic affinity scsi: hisi_sas: Modify the file permissions of trigger_dump to write only scsi: hisi_sas: Replace magic number when handle channel interrupt scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock scsi: hisi_sas: use threaded irq to process CQ interrupts scsi: ufs: Use UFS device indicated maximum LU number scsi: ufs: Add max_lu_supported in struct ufs_dev_info scsi: ufs: Delete is_init_prefetch from struct ufs_hba scsi: ufs: Inline two functions into their callers scsi: ufs: Move ufshcd_get_max_pwr_mode() to ufshcd_device_params_init() scsi: ufs: Split ufshcd_probe_hba() based on its called flow scsi: ufs: Delete struct ufs_dev_desc scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails scsi: ufs-mediatek: enable low-power mode for hibern8 state scsi: ufs: export some functions for vendor usage scsi: ufs-mediatek: add dbg_register_dump implementation scsi: qla2xxx: Fix a NULL pointer dereference in an error path scsi: qla1280: Make checking for 64bit support consistent scsi: megaraid_sas: Update driver version to 07.713.01.00-rc1 ...
2020-01-29Merge tag 'linux-kselftest-5.6-rc1-kunit' of ↵Linus Torvalds3-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest kunit updates from Shuah Khan: "This kunit update consists of: - Support for building kunit as a module from Alan Maguire - AppArmor KUnit tests for policy unpack from Mike Salvatore" * tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: building kunit as a module breaks allmodconfig kunit: update documentation to describe module-based build kunit: allow kunit to be loaded as a module kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds kunit: allow kunit tests to be loaded as a module kunit: hide unexported try-catch interface in try-catch-impl.h kunit: move string-stream.h to lib/kunit apparmor: add AppArmor KUnit tests for policy unpack
2020-01-29Merge tag 'y2038-drivers-for-v5.6-signed' of ↵Linus Torvalds12-54/+106
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull y2038 updates from Arnd Bergmann: "Core, driver and file system changes These are updates to device drivers and file systems that for some reason or another were not included in the kernel in the previous y2038 series. I've gone through all users of time_t again to make sure the kernel is in a long-term maintainable state, replacing all remaining references to time_t with safe alternatives. Some related parts of the series were picked up into the nfsd, xfs, alsa and v4l2 trees. A final set of patches in linux-mm removes the now unused time_t/timeval/timespec types and helper functions after all five branches are merged for linux-5.6, ensuring that no new users get merged. As a result, linux-5.6, or my backport of the patches to 5.4 [1], should be the first release that can serve as a base for a 32-bit system designed to run beyond year 2038, with a few remaining caveats: - All user space must be compiled with a 64-bit time_t, which will be supported in the coming musl-1.2 and glibc-2.32 releases, along with installed kernel headers from linux-5.6 or higher. - Applications that use the system call interfaces directly need to be ported to use the time64 syscalls added in linux-5.1 in place of the existing system calls. This impacts most users of futex() and seccomp() as well as programming languages that have their own runtime environment not based on libc. - Applications that use a private copy of kernel uapi header files or their contents may need to update to the linux-5.6 version, in particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h, linux/elfcore.h, linux/sockios.h, linux/timex.h and linux/can/bcm.h. - A few remaining interfaces cannot be changed to pass a 64-bit time_t in a compatible way, so they must be configured to use CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit timestamps. Most importantly this impacts all users of 'struct input_event'. - All y2038 problems that are present on 64-bit machines also apply to 32-bit machines. In particular this affects file systems with on-disk timestamps using signed 32-bit seconds: ext4 with ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs" [1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame * tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (21 commits) Revert "drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC" y2038: sh: remove timeval/timespec usage from headers y2038: sparc: remove use of struct timex y2038: rename itimerval to __kernel_old_itimerval y2038: remove obsolete jiffies conversion functions nfs: fscache: use timespec64 in inode auxdata nfs: fix timstamp debug prints nfs: use time64_t internally sunrpc: convert to time64_t for expiry drm/etnaviv: avoid deprecated timespec drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC drm/msm: avoid using 'timespec' hfs/hfsplus: use 64-bit inode timestamps hostfs: pass 64-bit timestamps to/from user space packet: clarify timestamp overflow tsacct: add 64-bit btime field acct: stop using get_seconds() um: ubd: use 64-bit time_t where possible xtensa: ISS: avoid struct timeval dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD ...
2020-01-29io_uring: add support for epoll_ctl(2)Jens Axboe1-0/+71
This adds IORING_OP_EPOLL_CTL, which can perform the same work as the epoll_ctl(2) system call. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29eventpoll: support non-blocking do_epoll_ctl() callsJens Axboe1-13/+33
Also make it available outside of epoll, along with the helper that decides if we need to copy the passed in epoll_event. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29eventpoll: abstract out epoll_ctl() handlerJens Axboe1-20/+25
No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29io_uring: fix linked command file table usageJens Axboe3-14/+21
We're not consistent in how the file table is grabbed and assigned if we have a command linked that requires the use of it. Add ->file_table to the io_op_defs[] array, and use that to determine when to grab the table instead of having the handlers set it if they need to defer. This also means we can kill the IO_WQ_WORK_NEEDS_FILES flag. We always initialize work->files, so io-wq can just check for that. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29Merge tag 'erofs-for-5.6-rc1' of ↵Linus Torvalds5-107/+74
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "A regression fix, several cleanups and (maybe) plus an upcoming new mount api convert patch as a part of vfs update are considered available for this cycle. All commits have been in linux-next and tested with no smoke out. Summary: - fix an out-of-bound read access introduced in v5.3, which could rarely cause data corruption - various cleanup patches" * tag 'erofs-for-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: clean up z_erofs_submit_queue() erofs: fold in postsubmit_is_all_bypassed() erofs: fix out-of-bound read for shifted uncompressed block erofs: remove void tagging/untagging of workgroup pointers erofs: remove unused tag argument while registering a workgroup erofs: remove unused tag argument while finding a workgroup erofs: correct indentation of an assigned structure inside a function
2020-01-29Merge branch 'work.adfs' of ↵Linus Torvalds9-740/+890
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull adfs updates from Al Viro: "adfs stuff for this cycle" * 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits) fs/adfs: bigdir: Fix an error code in adfs_fplus_read() Documentation: update adfs filesystem documentation fs/adfs: mostly divorse inode number from indirect disc address fs/adfs: super: add support for E and E+ floppy image formats fs/adfs: super: extract filesystem block probe fs/adfs: dir: remove debug in adfs_dir_update() fs/adfs: super: fix inode dropping fs/adfs: bigdir: implement directory update support fs/adfs: bigdir: calculate and validate directory checkbyte fs/adfs: bigdir: directory validation strengthening fs/adfs: bigdir: extract directory validation fs/adfs: bigdir: factor out directory entry offset calculation fs/adfs: newdir: split out directory commit from update fs/adfs: newdir: clean up adfs_f_update() fs/adfs: newdir: merge adfs_dir_read() into adfs_f_read() fs/adfs: newdir: improve directory validation fs/adfs: newdir: factor out directory format validation fs/adfs: dir: use pointers to access directory head/tails fs/adfs: dir: add more efficient iterate() per-format method fs/adfs: dir: switch to iterate_shared method ...
2020-01-29Merge branch 'work.openat2' of ↵Linus Torvalds5-93/+305
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull openat2 support from Al Viro: "This is the openat2() series from Aleksa Sarai. I'm afraid that the rest of namei stuff will have to wait - it got zero review the last time I'd posted #work.namei, and there had been a leak in the posted series I'd caught only last weekend. I was going to repost it on Monday, but the window opened and the odds of getting any review during that... Oh, well. Anyway, openat2 part should be ready; that _did_ get sane amount of review and public testing, so here it comes" From Aleksa's description of the series: "For a very long time, extending openat(2) with new features has been incredibly frustrating. This stems from the fact that openat(2) is possibly the most famous counter-example to the mantra "don't silently accept garbage from userspace" -- it doesn't check whether unknown flags are present[1]. This means that (generally) the addition of new flags to openat(2) has been fraught with backwards-compatibility issues (O_TMPFILE has to be defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old kernels gave errors, since it's insecure to silently ignore the flag[2]). All new security-related flags therefore have a tough road to being added to openat(2). Furthermore, the need for some sort of control over VFS's path resolution (to avoid malicious paths resulting in inadvertent breakouts) has been a very long-standing desire of many userspace applications. This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset (which was a variant of David Drysdale's O_BENEATH patchset[4] which was a spin-off of the Capsicum project[5]) with a few additions and changes made based on the previous discussion within [6] as well as others I felt were useful. In line with the conclusions of the original discussion of AT_NO_JUMPS, the flag has been split up into separate flags. However, instead of being an openat(2) flag it is provided through a new syscall openat2(2) which provides several other improvements to the openat(2) interface (see the patch description for more details). The following new LOOKUP_* flags are added: LOOKUP_NO_XDEV: Blocks all mountpoint crossings (upwards, downwards, or through absolute links). Absolute pathnames alone in openat(2) do not trigger this. Magic-link traversal which implies a vfsmount jump is also blocked (though magic-link jumps on the same vfsmount are permitted). LOOKUP_NO_MAGICLINKS: Blocks resolution through /proc/$pid/fd-style links. This is done by blocking the usage of nd_jump_link() during resolution in a filesystem. The term "magic-links" is used to match with the only reference to these links in Documentation/, but I'm happy to change the name. It should be noted that this is different to the scope of ~LOOKUP_FOLLOW in that it applies to all path components. However, you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it will *not* fail (assuming that no parent component was a magic-link), and you will have an fd for the magic-link. In order to correctly detect magic-links, the introduction of a new LOOKUP_MAGICLINK_JUMPED state flag was required. LOOKUP_BENEATH: Disallows escapes to outside the starting dirfd's tree, using techniques such as ".." or absolute links. Absolute paths in openat(2) are also disallowed. Conceptually this flag is to ensure you "stay below" a certain point in the filesystem tree -- but this requires some additional to protect against various races that would allow escape using "..". Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it can trivially beam you around the filesystem (breaking the protection). In future, there might be similar safety checks done as in LOOKUP_IN_ROOT, but that requires more discussion. In addition, two new flags are added that expand on the above ideas: LOOKUP_NO_SYMLINKS: Does what it says on the tin. No symlink resolution is allowed at all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this can still be used with NOFOLLOW to open an fd for the symlink as long as no parent path had a symlink component. LOOKUP_IN_ROOT: This is an extension of LOOKUP_BENEATH that, rather than blocking attempts to move past the root, forces all such movements to be scoped to the starting point. This provides chroot(2)-like protection but without the cost of a chroot(2) for each filesystem operation, as well as being safe against race attacks that chroot(2) is not. If a race is detected (as with LOOKUP_BENEATH) then an error is generated, and similar to LOOKUP_BENEATH it is not permitted to cross magic-links with LOOKUP_IN_ROOT. The primary need for this is from container runtimes, which currently need to do symlink scoping in userspace[7] when opening paths in a potentially malicious container. There is a long list of CVEs that could have bene mitigated by having RESOLVE_THIS_ROOT (such as CVE-2017-1002101, CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a few). In order to make all of the above more usable, I'm working on libpathrs[8] which is a C-friendly library for safe path resolution. It features a userspace-emulated backend if the kernel doesn't support openat2(2). Hopefully we can get userspace to switch to using it, and thus get openat2(2) support for free once it's ready. Future work would include implementing things like RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow programs to be sure they don't hit DoSes though stale NFS handles)" * 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Documentation: path-lookup: include new LOOKUP flags selftests: add openat2(2) selftests open: introduce openat2(2) syscall namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution namei: LOOKUP_IN_ROOT: chroot-like scoped resolution namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution namei: LOOKUP_NO_XDEV: block mountpoint crossing namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution namei: LOOKUP_NO_SYMLINKS: block symlink resolution namei: allow set_root() to produce errors namei: allow nd_jump_link() to produce errors nsfs: clean-up ns_get_path() signature to return int namei: only return -ECHILD from follow_dotdot_rcu()
2020-01-29Merge tag 'driver-core-5.6-rc1' of ↵Linus Torvalds3-24/+25
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is a small set of changes for 5.6-rc1 for the driver core and some firmware subsystem changes. Included in here are: - device.h splitup like you asked for months ago - devtmpfs minor cleanups - firmware core minor changes - debugfs fix for lockdown mode - kernfs cleanup fix - cpu topology minor fix All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (22 commits) firmware: Rename FW_OPT_NOFALLBACK to FW_OPT_NOFALLBACK_SYSFS devtmpfs: factor out common tail of devtmpfs_{create,delete}_node devtmpfs: initify a bit devtmpfs: simplify initialization of mount_dev devtmpfs: factor out setup part of devtmpfsd() devtmpfs: fix theoretical stale pointer deref in devtmpfsd() driver core: platform: fix u32 greater or equal to zero comparison cpu-topology: Don't error on more than CONFIG_NR_CPUS CPUs in device tree debugfs: Return -EPERM when locked down driver core: Print device when resources present in really_probe() driver core: Fix test_async_driver_probe if NUMA is disabled driver core: platform: Prevent resouce overflow from causing infinite loops fs/kernfs/dir.c: Clean code by removing always true condition component: do not dereference opaque pointer in debugfs drivers/component: remove modular code debugfs: Fix warnings when building documentation device.h: move 'struct driver' stuff out to device/driver.h device.h: move 'struct class' stuff out to device/class.h device.h: move 'struct bus' stuff out to device/bus.h device.h: move dev_printk()-like functions to dev_printk.h ...
2020-01-28io_uring: support using a registered personality for commandsJens Axboe1-1/+19
For personalities previously registered via IORING_REGISTER_PERSONALITY, allow any command to select them. This is done through setting sqe->personality to the id returned from registration, and then flagging sqe->flags with IOSQE_PERSONALITY. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-28io_uring: allow registering credentialsJens Axboe1-7/+68
If an application wants to use a ring with different kinds of credentials, it can register them upfront. We don't lookup credentials, the credentials of the task calling IORING_REGISTER_PERSONALITY is used. An 'id' is returned for the application to use in subsequent personality support. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-28io_uring: add io-wq workqueue sharingPavel Begunkov1-14/+50
If IORING_SETUP_ATTACH_WQ is set, it expects wq_fd in io_uring_params to be a valid io_uring fd io-wq of which will be shared with the newly created io_uring instance. If the flag is set but it can't share io-wq, it fails. This allows creation of "sibling" io_urings, where we prefer to keep the SQ/CQ private, but want to share the async backend to minimize the amount of overhead associated with having multiple rings that belong to the same backend. Reported-by: Jens Axboe <axboe@kernel.dk> Reported-by: Daurnimator <quae@daurnimator.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-28io-wq: allow grabbing existing io-wqPavel Begunkov2-0/+9
Export a helper to attach to an existing io-wq, rather than setting up a new one. This is doable now that we have reference counted io_wq's. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-28io_uring/io-wq: don't use static creds/mm assignmentsJens Axboe3-30/+81
We currently setup the io_wq with a static set of mm and creds. Even for a single-use io-wq per io_uring, this is suboptimal as we have may have multiple enters of the ring. For sharing the io-wq backend, it doesn't work at all. Switch to passing in the creds and mm when the work item is setup. This means that async work is no longer deferred to the io_uring mm and creds, it is done with the current mm and creds. Flag this behavior with IORING_FEAT_CUR_PERSONALITY, so applications know they can rely on the current personality (mm and creds) being the same for direct issue and async issue. Reviewed-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-28Merge branch 'linus' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Removed CRYPTO_TFM_RES flags - Extended spawn grabbing to all algorithm types - Moved hash descsize verification into API code Algorithms: - Fixed recursive pcrypt dead-lock - Added new 32 and 64-bit generic versions of poly1305 - Added cryptogams implementation of x86/poly1305 Drivers: - Added support for i.MX8M Mini in caam - Added support for i.MX8M Nano in caam - Added support for i.MX8M Plus in caam - Added support for A33 variant of SS in sun4i-ss - Added TEE support for Raven Ridge in ccp - Added in-kernel API to submit TEE commands in ccp - Added AMD-TEE driver - Added support for BCM2711 in iproc-rng200 - Added support for AES256-GCM based ciphers for chtls - Added aead support on SEC2 in hisilicon" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (244 commits) crypto: arm/chacha - fix build failured when kernel mode NEON is disabled crypto: caam - add support for i.MX8M Plus crypto: x86/poly1305 - emit does base conversion itself crypto: hisilicon - fix spelling mistake "disgest" -> "digest" crypto: chacha20poly1305 - add back missing test vectors and test chunking crypto: x86/poly1305 - fix .gitignore typo tee: fix memory allocation failure checks on drv_data and amdtee crypto: ccree - erase unneeded inline funcs crypto: ccree - make cc_pm_put_suspend() void crypto: ccree - split overloaded usage of irq field crypto: ccree - fix PM race condition crypto: ccree - fix FDE descriptor sequence crypto: ccree - cc_do_send_request() is void func crypto: ccree - fix pm wrongful error reporting crypto: ccree - turn errors to debug msgs crypto: ccree - fix AEAD decrypt auth fail crypto: ccree - fix typo in comment crypto: ccree - fix typos in error msgs crypto: atmel-{aes,sha,tdes} - Retire crypto_platform_data crypto: x86/sha - Eliminate casts on asm implementations ...
2020-01-28Merge tag '5.6-smb3-fixes-and-dfs-and-readdir-improvements' of ↵Linus Torvalds18-713/+1041
git://git.samba.org/sfrench/cifs-2.6 Pull cifs updates from Steve French: "Various SMB3/CIFS fixes including four for stable. - Improvement to fallocate (enables 3 additional xfstests) - Fix for file creation when mounting with modefromsid - Add ability to backup/restore dos attributes and creation time - DFS failover and reconnect fixes - performance optimization for readir Note that due to the upcoming SMB3 Test Event (at SNIA SDC next week) there will likely be more changesets near the end of the merge window (since we will be testing heavily next week, I held off on some patches and I expect some additional multichannel patches as well as patches to enable some additional xfstests)" * tag '5.6-smb3-fixes-and-dfs-and-readdir-improvements' of git://git.samba.org/sfrench/cifs-2.6: (24 commits) CIFS: Fix task struct use-after-free on reconnect cifs: use PTR_ERR_OR_ZERO() to simplify code cifs: add support for fallocate mode 0 for non-sparse files cifs: fix NULL dereference in match_prepath smb3: fix default permissions on new files when mounting with modefromsid CIFS: Add support for setting owner info, dos attributes, and create time cifs: remove set but not used variable 'server' cifs: Fix memory allocation in __smb2_handle_cancelled_cmd() cifs: Fix mount options set in automount cifs: fix unitialized variable poential problem with network I/O cache lock patch cifs: Fix return value in __update_cache_entry cifs: Avoid doing network I/O while holding cache lock cifs: Fix potential deadlock when updating vol in cifs_reconnect() cifs: Merge is_path_valid() into get_normalized_path() cifs: Introduce helpers for finding TCP connection cifs: Get rid of kstrdup_const()'d paths cifs: Clean up DFS referral cache cifs: Don't use iov_iter::type directly cifs: set correct max-buffer-size for smb2_ioctl_init() cifs: use compounding for open and first query-dir for readdir() ...
2020-01-28Merge tag 'fsverity-for-linus' of ↵Linus Torvalds9-66/+267
git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt Pull fsverity updates from Eric Biggers: - Optimize fs-verity sequential read performance by implementing readahead of Merkle tree pages. This allows the Merkle tree to be read in larger chunks. - Optimize FS_IOC_ENABLE_VERITY performance in the uncached case by implementing readahead of data pages. - Allocate the hash requests from a mempool in order to eliminate the possibility of allocation failures during I/O. * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fs-verity: use u64_to_user_ptr() fs-verity: use mempool for hash requests fs-verity: implement readahead of Merkle tree pages fs-verity: implement readahead for FS_IOC_ENABLE_VERITY
2020-01-28Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds22-354/+748
Pull fscrypt updates from Eric Biggers: - Extend the FS_IOC_ADD_ENCRYPTION_KEY ioctl to allow the raw key to be provided via a keyring key. - Prepare for the new dirhash method (SipHash of plaintext name) that will be used by directories that are both encrypted and casefolded. - Switch to a new format for "no-key names" that prepares for the new dirhash method, and also fixes a longstanding bug where multiple filenames could map to the same no-key name. - Allow the crypto algorithms used by fscrypt to be built as loadable modules when the fscrypt-capable filesystems are. - Optimize fscrypt_zeroout_range(). - Various cleanups. * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: (26 commits) fscrypt: improve format of no-key names ubifs: allow both hash and disk name to be provided in no-key names ubifs: don't trigger assertion on invalid no-key filename fscrypt: clarify what is meant by a per-file key fscrypt: derive dirhash key for casefolded directories fscrypt: don't allow v1 policies with casefolding fscrypt: add "fscrypt_" prefix to fname_encrypt() fscrypt: don't print name of busy file when removing key ubifs: use IS_ENCRYPTED() instead of ubifs_crypt_is_encrypted() fscrypt: document gfp_flags for bounce page allocation fscrypt: optimize fscrypt_zeroout_range() fscrypt: remove redundant bi_status check fscrypt: Allow modular crypto algorithms fscrypt: include <linux/ioctl.h> in UAPI header fscrypt: don't check for ENOKEY from fscrypt_get_encryption_info() fscrypt: remove fscrypt_is_direct_key_policy() fscrypt: move fscrypt_valid_enc_modes() to policy.c fscrypt: check for appropriate use of DIRECT_KEY flag earlier fscrypt: split up fscrypt_supported_policy() by policy version fscrypt: introduce fscrypt_needs_contents_encryption() ...
2020-01-28Merge tag 'fs-dedupe-last-block-tag' of ↵Linus Torvalds2-7/+6
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull fs deduplication fix from David Sterba: "This is a fix for deduplication bug: the last block of two files is allowed to deduplicated. This got broken in 5.1 by lifting some generic checks to VFS layer. The affected filesystems are btrfs and xfs. The patches are marked for stable as the bug decreases deduplication effectivity" * tag 'fs-dedupe-last-block-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: make deduplication with range including the last block work fs: allow deduplication of eof block into the end of the destination file
2020-01-28Merge tag 'for-5.6-tag' of ↵Linus Torvalds40-1715/+2988
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Features, highlights: - async discard - "mount -o discard=async" to enable it - freed extents are not discarded immediatelly, but grouped together and trimmed later, with IO rate limiting - the "sync" mode submits short extents that could have been ignored completely by the device, for SATA prior to 3.1 the requests are unqueued and have a big impact on performance - the actual discard IO requests have been moved out of transaction commit to a worker thread, improving commit latency - IO rate and request size can be tuned by sysfs files, for now enabled only with CONFIG_BTRFS_DEBUG as we might need to add/delete the files and don't have a stable-ish ABI for general use, defaults are conservative - export device state info in sysfs, eg. missing, writeable - no discard of extents known to be untouched on disk (eg. after reservation) - device stats reset is logged with process name and PID that called the ioctl Fixes: - fix missing hole after hole punching and fsync when using NO_HOLES - writeback: range cyclic mode could miss some dirty pages and lead to OOM - two more corner cases for metadata_uuid change after power loss during the change - fix infinite loop during fsync after mix of rename operations Core changes: - qgroup assign returns ENOTCONN when quotas not enabled, used to return EINVAL that was confusing - device closing does not need to allocate memory anymore - snapshot aware code got removed, disabled for years due to performance problems, reimplmentation will allow to select wheter defrag breaks or does not break COW on shared extents - tree-checker: - check leaf chunk item size, cross check against number of stripes - verify location keys for DIR_ITEM, DIR_INDEX and XATTR items - new self test for physical -> logical mapping code, used for super block range exclusion - assertion helpers/macros updated to avoid objtool "unreachable code" reports on older compilers or config option combinations" * tag 'for-5.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (84 commits) btrfs: free block groups after free'ing fs trees btrfs: Fix split-brain handling when changing FSID to metadata uuid btrfs: Handle another split brain scenario with metadata uuid feature btrfs: Factor out metadata_uuid code from find_fsid. btrfs: Call find_fsid from find_fsid_inprogress Btrfs: fix infinite loop during fsync after rename operations btrfs: set trans->drity in btrfs_commit_transaction btrfs: drop log root for dropped roots btrfs: sysfs, add devid/dev_state kobject and device attributes btrfs: Refactor btrfs_rmap_block to improve readability btrfs: Add self-tests for btrfs_rmap_block btrfs: selftests: Add support for dummy devices btrfs: Move and unexport btrfs_rmap_block btrfs: separate definition of assertion failure handlers btrfs: device stats, log when stats are zeroed btrfs: fix improper setting of scanned for range cyclic write cache pages btrfs: safely advance counter when looking up bio csums btrfs: remove unused member btrfs_device::work btrfs: remove unnecessary wrapper get_alloc_profile btrfs: add correction to handle -1 edge case in async discard ...
2020-01-28Merge branch 'x86-cache-for-linus' of ↵Linus Torvalds2-0/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Ingo Molnar: "The main change in this tree is the extension of the resctrl procfs ABI with a new file that helps tooling to navigate from tasks back to resctrl groups: /proc/{pid}/cpu_resctrl_groups. Also fix static key usage for certain feature combinations and simplify the task exit resctrl case" * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Add task resctrl information display x86/resctrl: Check monitoring static key in the MBM overflow handler x86/resctrl: Do not reconfigure exiting tasks
2020-01-28Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "These were the main changes in this cycle: - More -rt motivated separation of CONFIG_PREEMPT and CONFIG_PREEMPTION. - Add more low level scheduling topology sanity checks and warnings to filter out nonsensical topologies that break scheduling. - Extend uclamp constraints to influence wakeup CPU placement - Make the RT scheduler more aware of asymmetric topologies and CPU capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y - Make idle CPU selection more consistent - Various fixes, smaller cleanups, updates and enhancements - please see the git log for details" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) sched/fair: Define sched_idle_cpu() only for SMP configurations sched/topology: Assert non-NUMA topology masks don't (partially) overlap idle: fix spelling mistake "iterrupts" -> "interrupts" sched/fair: Remove redundant call to cpufreq_update_util() sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP sched/fair: calculate delta runnable load only when it's needed sched/cputime: move rq parameter in irqtime_account_process_tick stop_machine: Make stop_cpus() static sched/debug: Reset watchdog on all CPUs while processing sysrq-t sched/core: Fix size of rq::uclamp initialization sched/uclamp: Fix a bug in propagating uclamp value in new cgroups sched/fair: Load balance aggressively for SCHED_IDLE CPUs sched/fair : Improve update_sd_pick_busiest for spare capacity case watchdog: Remove soft_lockup_hrtimer_cnt and related code sched/rt: Make RT capacity-aware sched/fair: Make EAS wakeup placement consider uclamp restrictions sched/fair: Make task_fits_capacity() consider uclamp restrictions sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with() sched/uclamp: Make uclamp util helpers use and return UL values ...
2020-01-28Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - Ftrace is one of the last W^X violators (after this only KLP is left). These patches move it over to the generic text_poke() interface and thereby get rid of this oddity. This requires a surprising amount of surgery, by Peter Zijlstra. - x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to count certain types of events that have a special, quirky hw ABI (by Kim Phillips) - kprobes fixes by Masami Hiramatsu Lots of tooling updates as well, the following subcommands were updated: annotate/report/top, c2c, clang, record, report/top TUI, sched timehist, tests; plus updates were done to the gtk ui, libperf, headers and the parser" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) perf/x86/amd: Add support for Large Increment per Cycle Events perf/x86/amd: Constrain Large Increment per Cycle events perf/x86/intel/rapl: Add Comet Lake support tracing: Initialize ret in syscall_enter_define_fields() perf header: Use last modification time for timestamp perf c2c: Fix return type for histogram sorting comparision functions perf beauty sockaddr: Fix augmented syscall format warning perf/ui/gtk: Fix gtk2 build perf ui gtk: Add missing zalloc object perf tools: Use %define api.pure full instead of %pure-parser libperf: Setup initial evlist::all_cpus value perf report: Fix no libunwind compiled warning break s390 issue perf tools: Support --prefix/--prefix-strip perf report: Clarify in help that --children is default tools build: Fix test-clang.cpp with Clang 8+ perf clang: Fix build with Clang 9 kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic tools lib: Fix builds when glibc contains strlcpy() perf report/top: Make 'e' visible in the help and make it toggle showing callchains perf report/top: Do not offer annotation for symbols without samples ...
2020-01-27Merge tag 'smp-core-2020-01-28' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core SMP updates from Thomas Gleixner: "A small set of SMP core code changes: - Rework the smp function call core code to avoid the allocation of an additional cpumask - Remove the not longer required GFP argument from on_each_cpu_cond() and on_each_cpu_cond_mask() and fixup the callers" * tag 'smp-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Remove allocation mask from on_each_cpu_cond.*() smp: Add a smp_cond_func_t argument to smp_call_function_many() smp: Use smp_cond_func_t as type for the conditional function
2020-01-27Merge tag 'timers-core-2020-01-27' of ↵Linus Torvalds4-0/+104
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timekeeping and timers departement provides: - Time namespace support: If a container migrates from one host to another then it expects that clocks based on MONOTONIC and BOOTTIME are not subject to disruption. Due to different boot time and non-suspended runtime these clocks can differ significantly on two hosts, in the worst case time goes backwards which is a violation of the POSIX requirements. The time namespace addresses this problem. It allows to set offsets for clock MONOTONIC and BOOTTIME once after creation and before tasks are associated with the namespace. These offsets are taken into account by timers and timekeeping including the VDSO. Offsets for wall clock based clocks (REALTIME/TAI) are not provided by this mechanism. While in theory possible, the overhead and code complexity would be immense and not justified by the esoteric potential use cases which were discussed at Plumbers '18. The overhead for tasks in the root namespace (ie where host time offsets = 0) is in the noise and great effort was made to ensure that especially in the VDSO. If time namespace is disabled in the kernel configuration the code is compiled out. Kudos to Andrei Vagin and Dmitry Sofanov who implemented this feature and kept on for more than a year addressing review comments, finding better solutions. A pleasant experience. - Overhaul of the alarmtimer device dependency handling to ensure that the init/suspend/resume ordering is correct. - A new clocksource/event driver for Microchip PIT64 - Suspend/resume support for the Hyper-V clocksource - The usual pile of fixes, updates and improvements mostly in the driver code" * tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n alarmtimer: Use wakeup source from alarmtimer platform device alarmtimer: Make alarmtimer platform device child of RTC device alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality hrtimer: Add missing sparse annotation for __run_timer() lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres() MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources clocksource/drivers/timer-microchip-pit64b: Fix sparse warning clocksource/drivers/exynos_mct: Rename Exynos to lowercase clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access clocksource/drivers/timer-ti-dm: Switch to platform_get_irq clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource clocksource/drivers/bcm2835_timer: Fix memory leak of timer clocksource/drivers/cadence-ttc: Use ttc driver as platform driver clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page ...
2020-01-27io-wq: make the io_wq ref countedJens Axboe1-1/+10
In preparation for sharing an io-wq across different users, add a reference count that manages destruction of it. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-27io_uring: fix refcounting with batched allocations at OOMPavel Begunkov1-2/+5
In case of out of memory the second argument of percpu_ref_put_many() in io_submit_sqes() may evaluate into "nr - (-EAGAIN)", that is clearly wrong. Fixes: 2b85edfc0c90 ("io_uring: batch getting pcpu references") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-27io_uring: add comment for drain_nextPavel Begunkov1-0/+7
Draining the middle of a link is tricky, so leave a comment there Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-27io_uring: don't attempt to copy iovec for READ/WRITEJens Axboe1-2/+1
For the non-vectored variant of READV/WRITEV, we don't need to setup an async io context, and we flag that appropriately in the io_op_defs array. However, in fixing this for the 5.5 kernel in commit 74566df3a71c we didn't have these opcodes, so the check there was added just for the READ_FIXED and WRITE_FIXED opcodes. Replace that check with just a single check for needing async context, that covers all four of these read/write variants that don't use an iovec. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-26CIFS: Fix task struct use-after-free on reconnectVincent Whitchurch3-0/+6
The task which created the MID may be gone by the time cifsd attempts to call the callbacks on MIDs from cifs_reconnect(). This leads to a use-after-free of the task struct in cifs_wake_up_task: ================================================================== BUG: KASAN: use-after-free in __lock_acquire+0x31a0/0x3270 Read of size 8 at addr ffff8880103e3a68 by task cifsd/630 CPU: 0 PID: 630 Comm: cifsd Not tainted 5.5.0-rc6+ #119 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x8e/0xcb print_address_description.constprop.5+0x1d3/0x3c0 ? __lock_acquire+0x31a0/0x3270 __kasan_report+0x152/0x1aa ? __lock_acquire+0x31a0/0x3270 ? __lock_acquire+0x31a0/0x3270 kasan_report+0xe/0x20 __lock_acquire+0x31a0/0x3270 ? __wake_up_common+0x1dc/0x630 ? _raw_spin_unlock_irqrestore+0x4c/0x60 ? mark_held_locks+0xf0/0xf0 ? _raw_spin_unlock_irqrestore+0x39/0x60 ? __wake_up_common_lock+0xd5/0x130 ? __wake_up_common+0x630/0x630 lock_acquire+0x13f/0x330 ? try_to_wake_up+0xa3/0x19e0 _raw_spin_lock_irqsave+0x38/0x50 ? try_to_wake_up+0xa3/0x19e0 try_to_wake_up+0xa3/0x19e0 ? cifs_compound_callback+0x178/0x210 ? set_cpus_allowed_ptr+0x10/0x10 cifs_reconnect+0xa1c/0x15d0 ? generic_ip_connect+0x1860/0x1860 ? rwlock_bug.part.0+0x90/0x90 cifs_readv_from_socket+0x479/0x690 cifs_read_from_socket+0x9d/0xe0 ? cifs_readv_from_socket+0x690/0x690 ? mempool_resize+0x690/0x690 ? rwlock_bug.part.0+0x90/0x90 ? memset+0x1f/0x40 ? allocate_buffers+0xff/0x340 cifs_demultiplex_thread+0x388/0x2a50 ? cifs_handle_standard+0x610/0x610 ? rcu_read_lock_held_common+0x120/0x120 ? mark_lock+0x11b/0xc00 ? __lock_acquire+0x14ed/0x3270 ? __kthread_parkme+0x78/0x100 ? lockdep_hardirqs_on+0x3e8/0x560 ? lock_downgrade+0x6a0/0x6a0 ? lockdep_hardirqs_on+0x3e8/0x560 ? _raw_spin_unlock_irqrestore+0x39/0x60 ? cifs_handle_standard+0x610/0x610 kthread+0x2bb/0x3a0 ? kthread_create_worker_on_cpu+0xc0/0xc0 ret_from_fork+0x3a/0x50 Allocated by task 649: save_stack+0x19/0x70 __kasan_kmalloc.constprop.5+0xa6/0xf0 kmem_cache_alloc+0x107/0x320 copy_process+0x17bc/0x5370 _do_fork+0x103/0xbf0 __x64_sys_clone+0x168/0x1e0 do_syscall_64+0x9b/0xec0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 0: save_stack+0x19/0x70 __kasan_slab_free+0x11d/0x160 kmem_cache_free+0xb5/0x3d0 rcu_core+0x52f/0x1230 __do_softirq+0x24d/0x962 The buggy address belongs to the object at ffff8880103e32c0 which belongs to the cache task_struct of size 6016 The buggy address is located 1960 bytes inside of 6016-byte region [ffff8880103e32c0, ffff8880103e4a40) The buggy address belongs to the page: page:ffffea000040f800 refcount:1 mapcount:0 mapping:ffff8880108da5c0 index:0xffff8880103e4c00 compound_mapcount: 0 raw: 4000000000010200 ffffea00001f2208 ffffea00001e3408 ffff8880108da5c0 raw: ffff8880103e4c00 0000000000050003 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880103e3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880103e3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8880103e3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880103e3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880103e3b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== This can be reliably reproduced by adding the below delay to cifs_reconnect(), running find(1) on the mount, restarting the samba server while find is running, and killing find during the delay: spin_unlock(&GlobalMid_Lock); mutex_unlock(&server->srv_mutex); + msleep(10000); + cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__); list_for_each_safe(tmp, tmp2, &retry_list) { mid_entry = list_entry(tmp, struct mid_q_entry, qhead); Fix this by holding a reference to the task struct until the MID is freed. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-01-26cifs: use PTR_ERR_OR_ZERO() to simplify codeChen Zhou1-1/+1
PTR_ERR_OR_ZERO contains if(IS_ERR(...)) + PTR_ERR, just use PTR_ERR_OR_ZERO directly. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
2020-01-26cifs: add support for fallocate mode 0 for non-sparse filesRonnie Sahlberg3-37/+34
RHBZ 1336264 When we extend a file we must also force the size to be updated. This fixes an issue with holetest in xfs-tests which performs the following sequence : 1, create a new file 2, use fallocate mode==0 to populate the file 3, mmap the file 4, touch each page by reading the mmapped region. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: fix NULL dereference in match_prepathRonnie Sahlberg1-2/+4
RHBZ: 1760879 Fix an oops in match_prepath() by making sure that the prepath string is not NULL before we pass it into strcmp(). This is similar to other checks we make for example in cifs_root_iget() Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26smb3: fix default permissions on new files when mounting with modefromsidSteve French3-3/+29
When mounting with "modefromsid" mount parm most servers will require that some default permissions are given to users in the ACL on newly created files, files created with the new 'sd context' - when passing in an sd context on create, permissions are not inherited from the parent directory, so in addition to the ACE with the special SID which contains the mode, we also must pass in an ACE allowing users to access the file (GENERIC_ALL for authenticated users seemed like a reasonable default, although later we could allow a mount option or config switch to make it GENERIC_ALL for EVERYONE special sid). CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-By: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-01-26CIFS: Add support for setting owner info, dos attributes, and create timeBoris Protopopov1-11/+117
This is needed for backup/restore scenarios among others. Add extended attribute "system.cifs_ntsd" (and alias "system.smb3_ntsd") to allow for setting owner and DACL in the security descriptor. This is in addition to the existing "system.cifs_acl" and "system.smb3_acl" attributes that allow for setting DACL only. Add support for setting creation time and dos attributes using set_file_info() calls to complement the existing support for getting these attributes via query_path_info() calls. Signed-off-by: Boris Protopopov <bprotopopov@hotmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: remove set but not used variable 'server'YueHaibing1-4/+1
fs/cifs/smb2pdu.c: In function 'SMB2_query_directory': fs/cifs/smb2pdu.c:4444:26: warning: variable 'server' set but not used [-Wunused-but-set-variable] struct TCP_Server_Info *server; It is not used, so remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Fix memory allocation in __smb2_handle_cancelled_cmd()Paulo Alcantara (SUSE)1-1/+1
__smb2_handle_cancelled_cmd() is called under a spin lock held in cifs_mid_q_entry_release(), so make its memory allocation GFP_ATOMIC. This issue was observed when running xfstests generic/028: [ 1722.589204] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72064 cmd: 5 [ 1722.590687] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72065 cmd: 17 [ 1722.593529] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72066 cmd: 6 [ 1723.039014] BUG: sleeping function called from invalid context at mm/slab.h:565 [ 1723.040710] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 30877, name: cifsd [ 1723.045098] CPU: 3 PID: 30877 Comm: cifsd Not tainted 5.5.0-rc4+ #313 [ 1723.046256] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 [ 1723.048221] Call Trace: [ 1723.048689] dump_stack+0x97/0xe0 [ 1723.049268] ___might_sleep.cold+0xd1/0xe1 [ 1723.050069] kmem_cache_alloc_trace+0x204/0x2b0 [ 1723.051051] __smb2_handle_cancelled_cmd+0x40/0x140 [cifs] [ 1723.052137] smb2_handle_cancelled_mid+0xf6/0x120 [cifs] [ 1723.053247] cifs_mid_q_entry_release+0x44d/0x630 [cifs] [ 1723.054351] ? cifs_reconnect+0x26a/0x1620 [cifs] [ 1723.055325] cifs_demultiplex_thread+0xad4/0x14a0 [cifs] [ 1723.056458] ? cifs_handle_standard+0x2c0/0x2c0 [cifs] [ 1723.057365] ? kvm_sched_clock_read+0x14/0x30 [ 1723.058197] ? sched_clock+0x5/0x10 [ 1723.058838] ? sched_clock_cpu+0x18/0x110 [ 1723.059629] ? lockdep_hardirqs_on+0x17d/0x250 [ 1723.060456] kthread+0x1ab/0x200 [ 1723.061149] ? cifs_handle_standard+0x2c0/0x2c0 [cifs] [ 1723.062078] ? kthread_create_on_node+0xd0/0xd0 [ 1723.062897] ret_from_fork+0x3a/0x50 Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Fixes: 9150c3adbf24 ("CIFS: Close open handle after interrupted close") Cc: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-01-26cifs: Fix mount options set in automountPaulo Alcantara (SUSE)1-54/+43
Starting from 4a367dc04435, we must set the mount options based on the DFS full path rather than the resolved target, that is, cifs_mount() will be responsible for resolving the DFS link (cached) as well as performing failover to any other targets in the referral. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reported-by: Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> Fixes: 4a367dc04435 ("cifs: Add support for failover in cifs_mount()") Link: https://lore.kernel.org/linux-cifs/39643d7d-2abb-14d3-ced6-c394fab9a777@prodrive-technologies.com Tested-by: Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: fix unitialized variable poential problem with network I/O cache lock ↵Steve French1-1/+1
patch static analysis with Coverity detected an issue with the following commit: Author: Paulo Alcantara (SUSE) <pc@cjr.nz> Date: Wed Dec 4 17:38:03 2019 -0300 cifs: Avoid doing network I/O while holding cache lock Addresses-Coverity: ("Uninitialized pointer read") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Fix return value in __update_cache_entryYueHaibing1-1/+1
copy_ref_data() may return error, it should be returned to upstream caller. Fixes: 03535b72873b ("cifs: Avoid doing network I/O while holding cache lock") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Avoid doing network I/O while holding cache lockPaulo Alcantara (SUSE)1-264/+283
When creating or updating a cache entry, we need to get an DFS referral (get_dfs_referral), so avoid holding any locks during such network operation. To prevent that, do the following: * change cache hashtable sync method from RCU sync to a read/write lock. * use GFP_ATOMIC in memory allocations. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Fix potential deadlock when updating vol in cifs_reconnect()Paulo Alcantara (SUSE)1-32/+77
We can't acquire volume lock while refreshing the DFS cache because cifs_reconnect() may call dfs_cache_update_vol() while we are walking through the volume list. To prevent that, make vol_info refcounted, create a temp list with all volumes eligible for refreshing, and then use it without any locks held. Besides, replace vol_lock with a spinlock and protect cache_ttl from concurrent accesses or changes. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Merge is_path_valid() into get_normalized_path()Paulo Alcantara (SUSE)1-17/+4
Just do the trivial path validation in get_normalized_path(). Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Introduce helpers for finding TCP connectionPaulo Alcantara (SUSE)1-13/+31
Add helpers for finding TCP connections that are good candidates for being used by DFS refresh worker. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Get rid of kstrdup_const()'d pathsPaulo Alcantara (SUSE)1-3/+3
The DFS cache API is mostly used with heap allocated strings. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>