summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2017-02-27nfsd: fix configuration of supported minor versionsTrond Myklebust2-8/+22
When the user turns off all minor versions of NFSv4, that should be equivalent to turning off NFSv4 support, so a mount attempt using NFSv4 should get RPC_PROG_MISMATCH, not NFSERR_MINOR_VERS_MISMATCH. Allow the user to use either '4.0' or '4' to enable or disable minor version 0. Other minor versions are still enabled or disabled using the '4.x' format. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-24sunrpc: don't register UDP port with rpcbind when version needs congestion ↵Jeff Layton1-0/+7
control Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-24nfs/nfsd/sunrpc: enforce transport requirements for NFSv4Jeff Layton4-6/+27
NFSv4 requires a transport "that is specified to avoid network congestion" (RFC 7530, section 3.1, paragraph 2). In practical terms, that means that you should not run NFSv4 over UDP. The server has never enforced that requirement, however. This patchset fixes this by adding a new flag to the svc_version that states that it has these transport requirements. With that, we can check that the transport has XPT_CONG_CTRL set before processing an RPC. If it doesn't we reject it with RPC_PROG_MISMATCH. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-24sunrpc: flag transports as having congestion controlJeff Layton3-0/+10
NFSv4 requires a transport protocol with congestion control in most cases. On an IP network, that means that NFSv4 over UDP should be forbidden. The situation with RDMA is a bit more nuanced, but most RDMA transports are suitable for this. For now, we assume that all RDMA transports are suitable, but we may need to revise that at some point. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-24sunrpc: turn bitfield flags in svc_version into boolsJeff Layton6-10/+9
It's just simpler to read this way, IMO. Also, no need to explicitly set vs_hidden to false in the nfsacl ones. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-24nfsd: remove superfluous KERN_INFORasmus Villemoes1-1/+1
dprintk already provides a KERN_* prefix; this KERN_INFO just shows up as some odd characters in the output. Simplify the message a bit while we're there. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-21nfsd: special case truncates some moreChristoph Hellwig1-6/+26
Both the NFS protocols and the Linux VFS use a setattr operation with a bitmap of attributes to set to set various file attributes including the file size and the uid/gid. The Linux syscalls never mix size updates with unrelated updates like the uid/gid, and some file systems like XFS and GFS2 rely on the fact that truncates don't update random other attributes, and many other file systems handle the case but do not update the other attributes in the same transaction. NFSD on the other hand passes the attributes it gets on the wire more or less directly through to the VFS, leading to updates the file systems don't expect. XFS at least has an assert on the allowed attributes, which caught an unusual NFS client setting the size and group at the same time. To handle this issue properly this splits the notify_change call in nfsd_setattr into two separate ones. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-20nfsd: minor nfsd_setattr cleanupChristoph Hellwig1-17/+12
Simplify exit paths, size_change use. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-20nfsd: merge stable fix into main nfsd branchJ. Bruce Fields183-1239/+1692
2017-02-17NFSD: Reserve adequate space for LOCKT operationKinglong Mee1-5/+5
After tightening the OP_LOCKT reply size estimate, we can get warnings like: [11512.783519] RPC request reserved 124 but used 152 [11512.813624] RPC request reserved 108 but used 136 Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-17NFSD: Get response size before operation for all RPCsKinglong Mee1-6/+63
NFSD usess PAGE_SIZE as the reply size estimate for RPCs which don't support op_rsize_bop(), A PAGE_SIZE (4096) is larger than many real response sizes, eg, access (op_encode_hdr_size + 2), seek (op_encode_hdr_size + 3). This patch just adds op_rsize_bop() for all RPCs getting response size. An overestimate is generally safe but the tighter estimates are probably better. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-17nfsd/callback: Drop a useless data copy when comparing sessionidKinglong Mee1-7/+3
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-17nfsd/callback: skip the callback tagKinglong Mee1-0/+1
The callback tag is NULL, and hdr->nops is unused too right now, but. But if we were to ever test with a nonzero callback tag, nops will get a bad value. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-17nfsd/callback: Cleanup callback cred on shutdownKinglong Mee3-4/+15
The rpccred gotten from rpc_lookup_machine_cred() should be put when state is shutdown. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-17nfsd/idmap: return nfserr_inval for 0-length namesKinglong Mee1-0/+8
Tigran Mkrtchyan's new pynfs testcases for zero length principals fail: SATT16 st_setattr.testEmptyPrincipal : FAILURE Setting empty owner should return NFS4ERR_INVAL, instead got NFS4ERR_BADOWNER SATT17 st_setattr.testEmptyGroupPrincipal : FAILURE Setting empty owner_group should return NFS4ERR_INVAL, instead got NFS4ERR_BADOWNER This patch checks the principal and returns nfserr_inval directly. It could check after decoding in nfs4xdr.c, but it's simpler to do it in nfsd_map_xxxx. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-09nfsd: Revert "nfsd: special case truncates some more"J. Bruce Fields1-37/+60
This patch incorrectly attempted nested mnt_want_write, and incorrectly disabled nfsd's owner override for truncate. We'll fix those problems and make another attempt soon, for the moment I think the safest is to revert. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-08SUNRPC/Cache: Always treat the invalid cache as unexpiredKinglong Mee1-1/+4
When the first time pynfs runs after rpc/nfsd startup, always get the warning, "Got error: Connection closed" I found the problem is caused by, 1. A new startup of nfsd, rpc.mountd, etc, 2. A rpc request from client (pynfs test, or normal mounting), 3. An ip_map cache is created but invalid, so upcall to rpc.mountd, 4. rpc.mountd process the ip_map upcall, before write the valid data to nfsd, do auth_reload(), and check_useipaddr(), 5. For the first time, old_use_ipaddr = -1, it causes rpc.mountd do write_flush that doing cache_clean, 6. The ip_map cache will be treat as expired and clean, 7. When rpc.mountd write the valid data to nfsd, a new ip_map is created and updated, the cache_check of old ip_map(doing the upcall) will return -ETIMEDOUT. 8. RPC layer return SVC_CLOSE and close the xprt after commit 4d712ef1db05 "svcauth_gss: Close connection when dropping an incoming message" NeilBrown suggest in another email, "If CACHE_VALID is not set, then there is no data in the cache item, so there is nothing to expire. So it would be nice if cache items that don't have CACHE_VALID are never treated as expired." v3, change the order of the two patches v2, change the checking of CACHE_PENDING to CACHE_VALID Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-08SUNRPC: Drop all entries from cache_detail when cache_purge()Kinglong Mee1-15/+26
User always free the cache_detail after sunrpc_destroy_cache_detail(), so, it must cleanup up entries that left in the cache_detail, otherwise, NULL reference may be caused when using the left entries. Also, NeriBrown suggests "write a stand-alone cache_purge()." v3, move the cache_fresh_unlocked() out of write lock, v2, a stand-alone cache_purge(), not only for sunrpc_destroy_cache_detail Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-08svcrdma: Poll CQs in "workqueue" modeChuck Lever2-16/+16
svcrdma calls svc_xprt_put() in its completion handlers, which currently run in IRQ context. However, svc_xprt_put() is meant to be invoked in process context, not in IRQ context. After the last transport reference is gone, it directly calls a transport release function that expects to run in process context. Change the CQ polling modes to IB_POLL_WORKQUEUE so that svcrdma invokes svc_xprt_put() only in process context. As an added benefit, bottom half-disabled spin locking can be eliminated from I/O paths. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-08svcrdma: Combine list fields in struct svc_rdma_op_ctxtChuck Lever3-28/+22
Clean up: The free list and the dto_q list fields are never used at the same time. Reduce the size of struct svc_rdma_op_ctxt by combining these fields. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-08svcrdma: Remove unused sc_dto_q fieldChuck Lever2-2/+0
Clean up. Commit be99bb11400c ("svcrdma: Use new CQ API for RPC-over-RDMA server send CQs") removed code that used the sc_dto_q field, but neglected to remove sc_dto_q at the same time. Fixes: be99bb11400c ("svcrdma: Use new CQ API for RPC-over- ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-08svcrdma: Clean up backchannel send header encodingChuck Lever1-8/+9
Replace C structure-based XDR decoding with pointer arithmetic. Pointer arithmetic is considered more portable. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-08svcrdma: Clean up RPC-over-RDMA Call header decoderChuck Lever1-153/+79
Replace C structure-based XDR decoding with pointer arithmetic. Pointer arithmetic is considered more portable. Rename the "decode" functions. Nothing is decoded here, they perform only transport header sanity checking. Use existing XDR naming conventions to help readability. Straight-line the hot path: - relocate the dprintk call sites out of line - remove unnecessary byte-swapping - reduce count of conditional branches Deprecate RDMA_MSGP. It's not properly spec'd by RFC5666, and therefore never used by any V1 client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-08svcrdma: Clean up RPC-over-RDMA Reply header encoderChuck Lever4-29/+16
Replace C structure-based XDR decoding with pointer arithmetic. Pointer arithmetic is considered more portable, and is used throughout the kernel's existing XDR encoders. The gcc optimizer generates similar assembler code either way. Byte-swapping before a memory store on x86 typically results in an instruction pipeline stall. Avoid byte-swapping when encoding a new header. svcrdma currently doesn't alter a connection's credit grant value after the connection has been accepted, so it is effectively a constant. Cache the byte-swapped value in a separate field. Christoph suggested pulling the header encoding logic into the only function that uses it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-08svcrdma: Another sendto chunk list parsing updateChuck Lever4-25/+38
Commit 5fdca6531434 ("svcrdma: Renovate sendto chunk list parsing") missed a spot. svc_rdma_xdr_get_reply_hdr_len() also assumes the Write list has only one Write chunk. There's no harm in making this code more general. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-06NFSDv4: use export cache flushtime for changeid on V4ROOT objects.NeilBrown1-3/+7
If you change the set of filesystems that are exported, then the contents of various directories in the NFSv4 pseudo-root is likely to change. However the change-id of those directories is currently tied to the underlying directory, so the client may not see the changes in a timely fashion. This patch changes the change-id number to be derived from the "flush_time" of the export cache. Whenever any changes are made to the set of exported filesystems, this flush_time is updated. The result is that clients see changes to the set of exported filesystems much more quickly, often immediately. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-05Linux 4.10-rc7v4.10-rc7Linus Torvalds1-1/+1
2017-02-04Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds4-14/+50
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: - Prevent double activation of interrupt lines, which causes problems on certain interrupt controllers - Handle the fallout of the above because x86 (ab)uses the activation function to reconfigure interrupts under the hood. * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Make irq activate operations symmetric irqdomain: Avoid activating interrupts more than once
2017-02-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+1
Pull KVM fix from Radim Krčmář: "Fix a regression that prevented migration between hosts with different XSAVE features even if the missing features were not used by the guest (for stable)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: do not save guest-unsupported XSAVE state
2017-02-04Merge tag 'char-misc-4.10-rc7' of ↵Linus Torvalds4-6/+38
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are two bugfixes that resolve some reported issues. One in the firmware loader, that should fix the much-reported problem of crashes with it. The other is a hyperv fix for a reported regression. Both have been in linux-next for a week or so with no reported issues" * tag 'char-misc-4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: Drivers: hv: vmbus: finally fix hv_need_to_signal_on_read() firmware: fix NULL pointer dereference in __fw_load_abort()
2017-02-04Merge tag 'staging-4.10-rc7' of ↵Linus Torvalds6-9/+17
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging/IIO fixes from Greg KH: "Here are a few small IIO and one staging driver fix for 4.10-rc7. They fix some reported issues with the drivers. All of them have been in linux-next for a week or so with no reported issues" * tag 'staging-4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: greybus: timesync: validate platform state callback iio: dht11: Use usleep_range instead of msleep for start signal iio: adc: palmas_gpadc: retrieve a valid iio_dev in suspend/resume iio: health: max30100: fixed parenthesis around FIFO count check iio: health: afe4404: retrieve a valid iio_dev in suspend/resume iio: health: afe4403: retrieve a valid iio_dev in suspend/resume
2017-02-04Merge tag 'usb-4.10-rc7' of ↵Linus Torvalds8-15/+33
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB fixes for some reported issues, and the usual number of new device ids for 4.10-rc7. All of these, except the last new device id, have been in linux-next for a while with no reported issues" * tag 'usb-4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: serial: pl2303: add ATEN device ID usb: gadget: f_fs: Assorted buffer overflow checks. USB: Add quirk for WORLDE easykey.25 MIDI keyboard usb: musb: Fix external abort on non-linefetch for musb_irq_work() usb: musb: Fix host mode error -71 regression USB: serial: option: add device ID for HP lt2523 (Novatel E371) USB: serial: qcserial: add Dell DW5570 QDL
2017-02-03Merge tag 'scsi-fixes' of ↵Linus Torvalds1-1/+10
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "A single fix this time: a fix for a virtqueue removal bug which only appears to affect S390, but which results in the queue hanging forever thus causing the machine to fail shutdown" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: virtio_scsi: Reject commands when virtqueue is broken
2017-02-03Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds3-14/+5
Pull virtio/vhost fixes from Michael S. Tsirkin: "Last minute fixes: - ARM DMA fix revert - vhost endian-ness fix - MAINTAINERS: email address change for Amit" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: MAINTAINERS: update email address for Amit Shah vhost: fix initialization for vq->is_le Revert "vring: Force use of DMA API for ARM-based systems with legacy devices"
2017-02-03Merge tag 'vfio-v4.10-rc7' of git://github.com/awilliam/linux-vfioLinus Torvalds1-6/+5
Pull VFIO fix from Alex Williamson: "Fix an error path in SPAPR IOMMU backend (Alexey Kardashevskiy)" * tag 'vfio-v4.10-rc7' of git://github.com/awilliam/linux-vfio: vfio/spapr: Fix missing mutex unlock when creating a window
2017-02-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds10-18/+84
Merge fixes from Andrew Morton: "8 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm, fs: check for fatal signals in do_generic_file_read() fs: break out of iomap_file_buffered_write on fatal signals base/memory, hotplug: fix a kernel oops in show_valid_zones() mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone() jump label: pass kbuild_cflags when checking for asm goto support shmem: fix sleeping from atomic context kasan: respect /proc/sys/kernel/traceoff_on_warning zswap: disable changing params if init fails
2017-02-03mm, fs: check for fatal signals in do_generic_file_read()Michal Hocko1-0/+5
do_generic_file_read() can be told to perform a large request from userspace. If the system is under OOM and the reading task is the OOM victim then it has an access to memory reserves and finishing the full request can lead to the full memory depletion which is dangerous. Make sure we rather go with a short read and allow the killed task to terminate. Link: http://lkml.kernel.org/r/20170201092706.9966-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03fs: break out of iomap_file_buffered_write on fatal signalsMichal Hocko2-0/+8
Tetsuo has noticed that an OOM stress test which performs large write requests can cause the full memory reserves depletion. He has tracked this down to the following path __alloc_pages_nodemask+0x436/0x4d0 alloc_pages_current+0x97/0x1b0 __page_cache_alloc+0x15d/0x1a0 mm/filemap.c:728 pagecache_get_page+0x5a/0x2b0 mm/filemap.c:1331 grab_cache_page_write_begin+0x23/0x40 mm/filemap.c:2773 iomap_write_begin+0x50/0xd0 fs/iomap.c:118 iomap_write_actor+0xb5/0x1a0 fs/iomap.c:190 ? iomap_write_end+0x80/0x80 fs/iomap.c:150 iomap_apply+0xb3/0x130 fs/iomap.c:79 iomap_file_buffered_write+0x68/0xa0 fs/iomap.c:243 ? iomap_write_end+0x80/0x80 xfs_file_buffered_aio_write+0x132/0x390 [xfs] ? remove_wait_queue+0x59/0x60 xfs_file_write_iter+0x90/0x130 [xfs] __vfs_write+0xe5/0x140 vfs_write+0xc7/0x1f0 ? syscall_trace_enter+0x1d0/0x380 SyS_write+0x58/0xc0 do_syscall_64+0x6c/0x200 entry_SYSCALL64_slow_path+0x25/0x25 the oom victim has access to all memory reserves to make a forward progress to exit easier. But iomap_file_buffered_write and other callers of iomap_apply loop to complete the full request. We need to check for fatal signals and back off with a short write instead. As the iomap_apply delegates all the work down to the actor we have to hook into those. All callers that work with the page cache are calling iomap_write_begin so we will check for signals there. dax_iomap_actor has to handle the situation explicitly because it copies data to the userspace directly. Other callers like iomap_page_mkwrite work on a single page or iomap_fiemap_actor do not allocate memory based on the given len. Fixes: 68a9f5e7007c ("xfs: implement iomap based buffered write path") Link: http://lkml.kernel.org/r/20170201092706.9966-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03base/memory, hotplug: fix a kernel oops in show_valid_zones()Toshi Kani3-12/+23
Reading a sysfs "memoryN/valid_zones" file leads to the following oops when the first page of a range is not backed by struct page. show_valid_zones() assumes that 'start_pfn' is always valid for page_zone(). BUG: unable to handle kernel paging request at ffffea017a000000 IP: show_valid_zones+0x6f/0x160 This issue may happen on x86-64 systems with 64GiB or more memory since their memory block size is bumped up to 2GiB. [1] An example of such systems is desribed below. 0x3240000000 is only aligned by 1GiB and this memory block starts from 0x3200000000, which is not backed by struct page. BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable Since test_pages_in_a_zone() already checks holes, fix this issue by extending this function to return 'valid_start' and 'valid_end' for a given range. show_valid_zones() then proceeds with the valid range. [1] 'Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on large-memory x86-64 systems")' Link: http://lkml.kernel.org/r/20170127222149.30893-3-toshi.kani@hpe.com Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Zhang Zhen <zhenzhang.zhang@huawei.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> [4.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone()Toshi Kani1-4/+8
Patch series "fix a kernel oops when reading sysfs valid_zones", v2. A sysfs memory file is created for each 2GiB memory block on x86-64 when the system has 64GiB or more memory. [1] When the start address of a memory block is not backed by struct page, i.e. a memory range is not aligned by 2GiB, reading its 'valid_zones' attribute file leads to a kernel oops. This issue was observed on multiple x86-64 systems with more than 64GiB of memory. This patch-set fixes this issue. Patch 1 first fixes an issue in test_pages_in_a_zone(), which does not test the start section. Patch 2 then fixes the kernel oops by extending test_pages_in_a_zone() to return valid [start, end). Note for stable kernels: The memory block size change was made by commit bdee237c0343 ("x86: mm: Use 2GB memory block size on large-memory x86-64 systems"), which was accepted to 3.9. However, this patch-set depends on (and fixes) the change to test_pages_in_a_zone() made by commit 5f0f2887f4de ("mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone()"), which was accepted to 4.4. So, I recommend that we backport it up to 4.4. [1] 'Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on large-memory x86-64 systems")' This patch (of 2): test_pages_in_a_zone() does not check 'start_pfn' when it is aligned by section since 'sec_end_pfn' is set equal to 'pfn'. Since this function is called for testing the range of a sysfs memory file, 'start_pfn' is always aligned by section. Fix it by properly setting 'sec_end_pfn' to the next section pfn. Also make sure that this function returns 1 only when the range belongs to a zone. Link: http://lkml.kernel.org/r/20170127222149.30893-2-toshi.kani@hpe.com Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Andrew Banman <abanman@sgi.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Greg KH <greg@kroah.com> Cc: <stable@vger.kernel.org> [4.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03jump label: pass kbuild_cflags when checking for asm goto supportDavid Lin1-1/+1
Some versions of ARM GCC compiler such as Android toolchain throws in a '-fpic' flag by default. This causes the gcc-goto check script to fail although some config would have '-fno-pic' flag in the KBUILD_CFLAGS. This patch passes the KBUILD_CFLAGS to the check script so that the script does not rely on the default config from different compilers. Link: http://lkml.kernel.org/r/20170120234329.78868-1-dtwlin@google.com Signed-off-by: David Lin <dtwlin@google.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Michal Marek <mmarek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03shmem: fix sleeping from atomic contextKirill A. Shutemov1-2/+9
Syzkaller fuzzer managed to trigger this: BUG: sleeping function called from invalid context at mm/shmem.c:852 in_atomic(): 1, irqs_disabled(): 0, pid: 529, name: khugepaged 3 locks held by khugepaged/529: #0: (shrinker_rwsem){++++..}, at: [<ffffffff818d7ef1>] shrink_slab.part.59+0x121/0xd30 mm/vmscan.c:451 #1: (&type->s_umount_key#29){++++..}, at: [<ffffffff81a63630>] trylock_super+0x20/0x100 fs/super.c:392 #2: (&(&sbinfo->shrinklist_lock)->rlock){+.+.-.}, at: [<ffffffff818fd83e>] spin_lock include/linux/spinlock.h:302 [inline] #2: (&(&sbinfo->shrinklist_lock)->rlock){+.+.-.}, at: [<ffffffff818fd83e>] shmem_unused_huge_shrink+0x28e/0x1490 mm/shmem.c:427 CPU: 2 PID: 529 Comm: khugepaged Not tainted 4.10.0-rc5+ #201 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: shmem_undo_range+0xb20/0x2710 mm/shmem.c:852 shmem_truncate_range+0x27/0xa0 mm/shmem.c:939 shmem_evict_inode+0x35f/0xca0 mm/shmem.c:1030 evict+0x46e/0x980 fs/inode.c:553 iput_final fs/inode.c:1515 [inline] iput+0x589/0xb20 fs/inode.c:1542 shmem_unused_huge_shrink+0xbad/0x1490 mm/shmem.c:446 shmem_unused_huge_scan+0x10c/0x170 mm/shmem.c:512 super_cache_scan+0x376/0x450 fs/super.c:106 do_shrink_slab mm/vmscan.c:378 [inline] shrink_slab.part.59+0x543/0xd30 mm/vmscan.c:481 shrink_slab mm/vmscan.c:2592 [inline] shrink_node+0x2c7/0x870 mm/vmscan.c:2592 shrink_zones mm/vmscan.c:2734 [inline] do_try_to_free_pages+0x369/0xc80 mm/vmscan.c:2776 try_to_free_pages+0x3c6/0x900 mm/vmscan.c:2982 __perform_reclaim mm/page_alloc.c:3301 [inline] __alloc_pages_direct_reclaim mm/page_alloc.c:3322 [inline] __alloc_pages_slowpath+0xa24/0x1c30 mm/page_alloc.c:3683 __alloc_pages_nodemask+0x544/0xae0 mm/page_alloc.c:3848 __alloc_pages include/linux/gfp.h:426 [inline] __alloc_pages_node include/linux/gfp.h:439 [inline] khugepaged_alloc_page+0xc2/0x1b0 mm/khugepaged.c:750 collapse_huge_page+0x182/0x1fe0 mm/khugepaged.c:955 khugepaged_scan_pmd+0xfdf/0x12a0 mm/khugepaged.c:1208 khugepaged_scan_mm_slot mm/khugepaged.c:1727 [inline] khugepaged_do_scan mm/khugepaged.c:1808 [inline] khugepaged+0xe9b/0x1590 mm/khugepaged.c:1853 kthread+0x326/0x3f0 kernel/kthread.c:227 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 The iput() from atomic context was a bad idea: if after igrab() somebody else calls iput() and we left with the last inode reference, our iput() would lead to inode eviction and therefore sleeping. This patch should fix the situation. Link: http://lkml.kernel.org/r/20170131093141.GA15899@node.shutemov.name Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03kasan: respect /proc/sys/kernel/traceoff_on_warningPeter Zijlstra1-0/+3
After much waiting I finally reproduced a KASAN issue, only to find my trace-buffer empty of useful information because it got spooled out :/ Make kasan_report honour the /proc/sys/kernel/traceoff_on_warning interface. Link: http://lkml.kernel.org/r/20170125164106.3514-1-aryabinin@virtuozzo.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03zswap: disable changing params if init failsDan Streetman1-1/+29
Add zswap_init_failed bool that prevents changing any of the module params, if init_zswap() fails, and set zswap_enabled to false. Change 'enabled' param to a callback, and check zswap_init_failed before allowing any change to 'enabled', 'zpool', or 'compressor' params. Any driver that is built-in to the kernel will not be unloaded if its init function returns error, and its module params remain accessible for users to change via sysfs. Since zswap uses param callbacks, which assume that zswap has been initialized, changing the zswap params after a failed initialization will result in WARNING due to the param callbacks expecting a pool to already exist. This prevents that by immediately exiting any of the param callbacks if initialization failed. This was reported here: https://marc.info/?l=linux-mm&m=147004228125528&w=4 And fixes this WARNING: [ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60 The warning is just noise, and not serious. However, when init fails, zswap frees all its percpu dstmem pages and its kmem cache. The kmem cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but the percpu dstmem pages are definitely a problem, as they're used as temporary buffer for compressed pages before copying into place in the zpool. If the user does get zswap enabled after an init failure, then zswap will likely Oops on the first page it tries to compress (or worse, start corrupting memory). Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime") Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org Signed-off-by: Dan Streetman <dan.streetman@canonical.com> Reported-by: Marcin Miroslaw <marcin@mejor.pl> Cc: Seth Jennings <sjenning@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03Merge tag 'regulator-fix-v4.10-rc6' of ↵Linus Torvalds3-48/+2
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "Three changes here: two run of the mill driver specific fixes and a change from Mark Rutland which reverts some new device specific ACPI binding code which was added during the merge window as there are concerns about this sending the wrong signal about usage of regulators in ACPI systems" * tag 'regulator-fix-v4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: fixed: Revert support for ACPI interface regulator: axp20x: AXP806: Fix dcdcb being set instead of dcdce regulator: twl6030: fix range comparison, allowing vsel = 59
2017-02-03MAINTAINERS: update email address for Amit ShahAmit Shah1-1/+1
I'm leaving my job at Red Hat, this email address will stop working next week. Update it to one that I will have access to later. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-03vhost: fix initialization for vq->is_leHalil Pasic1-6/+4
Currently, under certain circumstances vhost_init_is_le does just a part of the initialization job, and depends on vhost_reset_is_le being called too. For this reason vhost_vq_init_access used to call vhost_reset_is_le when vq->private_data is NULL. This is not only counter intuitive, but also real a problem because it breaks vhost_net. The bug was introduced to vhost_net with commit 2751c9882b94 ("vhost: cross-endian support for legacy devices"). The symptom is corruption of the vq's used.idx field (virtio) after VHOST_NET_SET_BACKEND was issued as a part of the vhost shutdown on a vq with pending descriptors. Let us make sure the outcome of vhost_init_is_le never depend on the state it is actually supposed to initialize, and fix virtio_net by removing the reset from vhost_vq_init_access. With the above, there is no reason for vhost_reset_is_le to do just half of the job. Let us make vhost_reset_is_le reinitialize is_le. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reported-by: Michael A. Tebolt <miket@us.ibm.com> Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fixes: commit 2751c9882b94 ("vhost: cross-endian support for legacy devices") Cc: <stable@vger.kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Michael A. Tebolt <miket@us.ibm.com>
2017-02-03Revert "vring: Force use of DMA API for ARM-based systems with legacy devices"Michael S. Tsirkin1-7/+0
This reverts commit c7070619f3408d9a0dffbed9149e6f00479cf43b. This has been shown to regress on some ARM systems: by forcing on DMA API usage for ARM systems, we have inadvertently kicked open a hornets' nest in terms of cache-coherency. Namely that unless the virtio device is explicitly described as capable of coherent DMA by firmware, the DMA APIs on ARM and other DT-based platforms will assume it is non-coherent. This turns out to cause a big problem for the likes of QEMU and kvmtool, which generate virtio-mmio devices in their guest DTs but neglect to add the often-overlooked "dma-coherent" property; as a result, we end up with the guest making non-cacheable accesses to the vring, the host doing so cacheably, both talking past each other and things going horribly wrong. We are working on a safer work-around. Fixes: c7070619f340 ("vring: Force use of DMA API for ARM-based systems with legacy devices") Reported-by: Robin Murphy <robin.murphy@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2017-02-03Merge tag 'usb-serial-4.10-rc7' of ↵Greg Kroah-Hartman2-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus Johan writes: USB-serial fixes for v4.10-rc7 One more device ID for pl2303. Signed-off-by: Johan Hovold <johan@kernel.org>
2017-02-03Merge tag 'mmc-v4.10-rc6' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fix from Ulf Hansson: "MMC host: sdhci: Avoid hang when receiving spurious CARD_INT interrupts" * tag 'mmc-v4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci: Ignore unexpected CARD_INT interrupts