summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-05-07libceph: make ceph_pr_addr take an struct ceph_entity_addr pointerJeff Layton8-34/+35
GCC9 is throwing a lot of warnings about unaligned accesses by callers of ceph_pr_addr. All of the current callers are passing a pointer to the sockaddr inside struct ceph_entity_addr. Fix it to take a pointer to a struct ceph_entity_addr instead, and then have the function make a copy of the sockaddr before printing it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07libceph: fix unaligned accesses in ceph_entity_addr handlingJeff Layton1-40/+37
GCC9 is throwing a lot of warnings about unaligned access. This patch fixes some of them by changing most of the sockaddr handling functions to take a pointer to struct ceph_entity_addr instead of struct sockaddr_storage. The lower functions can then make copies or do unaligned accesses as needed. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07rbd: don't assert on writes to snapshotsIlya Dryomov1-2/+6
The check added in commit 721c7fc701c7 ("block: fail op_is_write() requests to read-only partitions") was lifted in commit a32e236eb93e ("Partially revert "block: fail op_is_write() requests to read-only partitions""). Basic things like user triggered writes and discards are still caught, but internal kernel users can submit anything. In particular, ext4 will attempt to write to the superblock if it detects errors in the filesystem, even if the filesystem is mounted read-only on a read-only partition. The assert is overkill regardless. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07rbd: client_mutex is never nestedIlya Dryomov1-1/+1
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: print inode number in __caps_issued_mask debugging messagesJeff Layton1-6/+6
To make it easier to correlate with MDS logs. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: just call get_session in __ceph_lookup_mds_sessionJeff Layton1-7/+1
I originally thought there was a potential race here, but the fact that this is called with the mdsc->mutex held, ensures that the last reference to the session can't be put here. Still, it's clearer to just return the value from get_session here, and may prevent a bug later if we ever rework this code to be less reliant on mutexes. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: simplify arguments and return semantics of try_get_cap_refsJeff Layton1-46/+30
The return of this function is rather complex. It can return 0 or 1, and in the case of a 1 return, the "err" pointer will be filled out. This necessitates a lot of copying of values. We can achieve the same effect by just returning 0, 1 or a negative error code, and drop the "err" argument from this function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: fix comment over ceph_drop_caps_for_unlinkJeff Layton1-1/+1
It's not clear what AUTH_RDCACHE means in this context, and we're clearly just dropping LINK caps here. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: move wait for mds request into helper functionJeff Layton1-15/+21
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: have ceph_mdsc_do_request call ceph_mdsc_submit_requestJeff Layton2-23/+22
Nothing calls ceph_mdsc_submit_request today, but in later patches we'll need to be able to call this separately. Have the helper return an int so we can check the r_err under the mutex, and have the caller just check the error code from the submit. Also move the acquisition of CEPH_CAP_PIN references into the same function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: after an MDS request, do callback and completionsJeff Layton1-2/+1
No MDS requests use r_callback today, but that will change in the future. The OSD client always does r_callback and then completes r_completion. Let's have the MDS client do the same. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: use pathlen values returned by set_request_path_attrJeff Layton1-2/+2
We make copies of the dentry name in set_request_path_attr, but then create_request_message re-fetches the lengths out of the dentry. While we don't currently set the *_drop fields unless the parents are locked, it's still better not to rely on that sort of implicit assumption. Use the pathlen values that set_request_path_attr returned instead, as they will always be correct for the returned paths themselves. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: use __getname/__putname in ceph_mdsc_build_pathJeff Layton3-39/+36
Al suggested we get rid of the kmalloc here and just use __getname and __putname to get a full PATH_MAX pathname buffer. Since we build the path in reverse, we continue to return a pointer to the beginning of the string and the length, and add a new helper to free the thing at the end. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: use ceph_mdsc_build_path instead of clone_dentry_nameJeff Layton1-39/+3
While it may be slightly more efficient, it's probably not worthwhile to optimize for the case that clone_dentry_name handles. We can get the same result by just calling ceph_mdsc_build_path when the parent isn't locked, with less code duplication. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: fix potential use-after-free in ceph_mdsc_build_pathJeff Layton1-3/+5
temp is not defined outside of the RCU critical section here. Ensure we grab that value before we drop the rcu_read_lock. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: dump granular cap info in "caps" debugfs fileJeff Layton1-2/+32
We have a "caps" file already that gives statistics on the caps cache as a whole. Add another section to that output and dump a line for each individual cap record. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: make iterate_session_caps a public symbolJeff Layton2-8/+12
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: fix NULL pointer deref when debugging is enabledJeff Layton1-1/+1
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: properly handle granular statx requestsJeff Layton1-28/+57
cephfs can benefit from statx. We can have the client just request caps sufficient for the needed attributes and leave off the rest. Also, recognize when AT_STATX_DONT_SYNC is set, and just scrape the inode without doing any call in that case. Force a call to the MDS in the event that AT_STATX_FORCE_SYNC is set. Link: http://tracker.ceph.com/issues/39258 Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: remove superfluous inode_lock in ceph_fsyncJeff Layton1-3/+0
Originally, filemap_write_and_wait took the i_mutex internally, but commit 02c24a82187d pushed the mutex acquisition into the individual fsync routines, leaving it up to the subsystem maintainers to remove it if it wasn't needed. For ceph, I see no reason to take the inode_lock here. All of the operations inside that lock are protected by their own locking. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07libceph: fix clang warning for CEPH_DEFINE_OID_ONSTACKArnd Bergmann1-7/+6
clang complains about assigning a variable to itself during the declaration: fs/ceph/ioctl.c:187:26: error: variable 'oid' is uninitialized when used within its own initialization [-Werror,-Wuninitialized] CEPH_DEFINE_OID_ONSTACK(oid); ^~~ include/linux/ceph/osdmap.h:122:52: note: expanded from macro 'CEPH_DEFINE_OID_ONSTACK' struct ceph_object_id oid = CEPH_OID_INIT_ONSTACK(oid) ~~~ ^~~ include/linux/ceph/osdmap.h:120:29: note: expanded from macro 'CEPH_OID_INIT_ONSTACK' ({ ceph_oid_init(&oid); oid; }) ^~~ We use this trick in other places, but it is completely unnecessary here, as we can just use a regular struct initializer. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07rbd: convert all rbd_assert(0) to BUG()Arnd Bergmann1-6/+6
rbd_assert(0) has caused different issues depending on the compiler version in the past, so it seems better to avoid it completely. Replace the remaining instances. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07rbd: avoid clang -Wuninitialized warningArnd Bergmann1-1/+1
clang fails to see that rbd_assert(0) ends in an unreachable code path and warns about a subsequent use of an uninitialized variable when CONFIG_PROFILE_ANNOTATED_BRANCHES is set: drivers/block/rbd.c:2402:4: error: variable 'ret' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] rbd_assert(0); ^~~~~~~~~~~~~ drivers/block/rbd.c:563:7: note: expanded from macro 'rbd_assert' if (unlikely(!(expr))) { \ ^~~~~~~~~~~~~~~~~ include/linux/compiler.h:48:23: note: expanded from macro 'unlikely' # define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x))) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/block/rbd.c:2410:6: note: uninitialized use occurs here if (ret) { ^~~ drivers/block/rbd.c:2402:4: note: remove the 'if' if its condition is always true rbd_assert(0); ^ drivers/block/rbd.c:563:3: note: expanded from macro 'rbd_assert' if (unlikely(!(expr))) { \ ^ drivers/block/rbd.c:2376:9: note: initialize the variable 'ret' to silence this warning int ret; ^ = 0 1 error generated. This seems to be a bug in clang, but is easy to work around by using an unconditional BUG(). Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: snapshot nfs re-exportYan, Zheng2-27/+329
To support snapshot nfs re-export, we need a way to lookup snapped inode by file handle. For directory inode, snapped metadata are always stored together with head inode. Client just need to pass vinodeno_t to MDS. For non-directory inode, there can be multiple version of snapped inodes and they can be stored in different dirfrags. Besides vinodeno_t, client also need to tell mds from which dirfrag it got the snapped inode. Another problem of supporting snapshot nfs re-export is that there can be multiple paths to access a snapped inode. For example: mkdir -p d1/d2/d3 mkdir d1/.snap/s1 Paths 'd1/.snap/s1/d2/d3', 'd1/d2/.snap/_s1_<inode number of d1>/d3' and 'd1/d2/d3/.snap/_s1_<inode number of d1>' are all reference to the same snapped inode. For a given snapped inode, There is no convenient way to get the first form and the second form paths. For simplicity, ceph_get_parent() return snapdir for snapped directory inode. Furthermore, client may access snapshot of deleted directory. For example: mkdir -p d1/d2 mkdir d1/.snap/s1 open d1/.snap/s1/d2 rm -rf d1/d2 <nfs server restart> The path constucted by ceph_get_parent() and ceph_get_name() is '<inode of d2>/.snap/_s1_<inode number of d1>'. Futher lookup parent of <inode of d2> will fail. To workaround this case, this patch uses d_obtain_root() to get dentry for snapdir of deleted directory. snapdir dentry has no DCACHE_DISCONNECTED flag set, reconnect_path() stops when it reaches snapdir dentry. Link: http://tracker.ceph.com/issues/22105 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: quota: fix quota subdir mountsLuis Henriques4-10/+190
The CephFS kernel client does not enforce quotas set in a directory that isn't visible from the mount point. For example, given the path '/dir1/dir2', if quotas are set in 'dir1' and the filesystem is mounted with mount -t ceph <server>:<port>:/dir1/ /mnt then the client won't be able to access 'dir1' inode, even if 'dir2' belongs to a quota realm that points to it. This patch fixes this issue by simply doing an MDS LOOKUPINO operation for unknown inodes. Any inode reference obtained this way will be added to a list in ceph_mds_client, and will only be released when the filesystem is umounted. Link: https://tracker.ceph.com/issues/38482 Reported-by: Hendrik Peyerl <hpeyerl@plusline.net> Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: factor out ceph_lookup_inode()Luis Henriques2-2/+13
This function will be used by __fh_to_dentry and by the quotas code, to find quota realm inodes that are not visible in the mountpoint. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: remove duplicated filelock ref increaseZhi Zhang1-13/+0
Inode i_filelock_ref is increased in ceph_lock or ceph_flock, but it is increased again in ceph_lock_message. This results in this ref won't become zero. If CEPH_I_ERROR_FILELOCK flag is set in remove_session_caps once, this flag can't be cleared even if client is back to normal. So further file lock will return EIO. Signed-off-by: Zhi Zhang <zhang.david2011@gmail.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-05Linux 5.1v5.1Linus Torvalds1-1/+1
2019-05-05Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds18-32/+272
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "I'd like to apologize for this very late pull request: I was dithering through the week whether to send the fixes, and then yesterday Jiri's crash fix for a regression introduced in this cycle clearly marked perf/urgent as 'must merge now'. Most of the commits are tooling fixes, plus there's three kernel fixes via four commits: - race fix in the Intel PEBS code - fix an AUX bug and roll back a previous attempt - fix AMD family 17h generic HW cache-event perf counters The largest diffstat contribution comes from the AMD fix - a new event table is introduced, which is a fairly low risk change but has a large linecount" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix race in intel_pmu_disable_event() perf/x86/intel/pt: Remove software double buffering PMU capability perf/ring_buffer: Fix AUX software double buffering perf tools: Remove needless asm/unistd.h include fixing build in some places tools arch uapi: Copy missing unistd.h headers for arc, hexagon and riscv tools build: Add -ldl to the disassembler-four-args feature test perf cs-etm: Always allocate memory for cs_etm_queue::prev_packet perf cs-etm: Don't check cs_etm_queue::prev_packet validity perf report: Report OOM in status line in the GTK UI perf bench numa: Add define for RUSAGE_THREAD if not present tools lib traceevent: Change tag string for error perf annotate: Fix build on 32 bit for BPF annotation tools uapi x86: Sync vmx.h with the kernel perf bpf: Return value with unlocking in perf_env__find_btf() MAINTAINERS: Include vendor specific files under arch/*/events/* perf/x86/amd: Update generic hardware cache events for Family 17h
2019-05-05Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a kobject memory leak in the cpufreq code" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/cpufreq: Fix kobject memleak
2019-05-05Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-0/+23
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "Disable function tracing during early SME setup to fix a boot crash on SME-enabled kernels running distro kernels (some of which have function tracing enabled)" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/mem_encrypt: Disable all instrumentation for early SME setup
2019-05-05Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds5-16/+26
Pull vfs fixes from Al Viro: - a couple of ->i_link use-after-free fixes - regression fix for wrong errno on absent device name in mount(2) (this cycle stuff) - ancient UFS braino in large GID handling on Solaris UFS images (bogus cut'n'paste from large UID handling; wrong field checked to decide whether we should look at old (16bit) or new (32bit) field) * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour Abort file_remove_privs() for non-reg. files [fix] get rid of checking for absent device name in vfs_get_tree() apparmorfs: fix use-after-free on symlink traversal securityfs: fix use-after-free on symlink traversal
2019-05-05perf/x86/intel: Fix race in intel_pmu_disable_event()Jiri Olsa1-3/+7
New race in x86_pmu_stop() was introduced by replacing the atomic __test_and_clear_bit() of cpuc->active_mask by separate test_bit() and __clear_bit() calls in the following commit: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler") The race causes panic for PEBS events with enabled callchains: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 ... RIP: 0010:perf_prepare_sample+0x8c/0x530 Call Trace: <NMI> perf_event_output_forward+0x2a/0x80 __perf_event_overflow+0x51/0xe0 handle_pmi_common+0x19e/0x240 intel_pmu_handle_irq+0xad/0x170 perf_event_nmi_handler+0x2e/0x50 nmi_handle+0x69/0x110 default_do_nmi+0x3e/0x100 do_nmi+0x11a/0x180 end_repeat_nmi+0x16/0x1a RIP: 0010:native_write_msr+0x6/0x20 ... </NMI> intel_pmu_disable_event+0x98/0xf0 x86_pmu_stop+0x6e/0xb0 x86_pmu_del+0x46/0x140 event_sched_out.isra.97+0x7e/0x160 ... The event is configured to make samples from PEBS drain code, but when it's disabled, we'll go through NMI path instead, where data->callchain will not get allocated and we'll crash: x86_pmu_stop test_bit(hwc->idx, cpuc->active_mask) intel_pmu_disable_event(event) { ... intel_pmu_pebs_disable(event); ... EVENT OVERFLOW -> <NMI> intel_pmu_handle_irq handle_pmi_common TEST PASSES -> test_bit(bit, cpuc->active_mask)) perf_event_overflow perf_prepare_sample { ... if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY)) data->callchain = perf_callchain(event, regs); CRASH -> size += data->callchain->nr; } </NMI> ... x86_pmu_disable_event(event) } __clear_bit(hwc->idx, cpuc->active_mask); Fixing this by disabling the event itself before setting off the PEBS bit. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Arcari <darcari@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Lendacky Thomas <Thomas.Lendacky@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler") Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-04Merge tag 'powerpc-5.1-7' of ↵Linus Torvalds1-4/+14
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "One regression fix. Changes we merged to STRICT_KERNEL_RWX on 32-bit were causing crashes under load on some machines depending on memory layout. Thanks to Christophe Leroy" * tag 'powerpc-5.1-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/32s: Fix BATs setting with CONFIG_STRICT_KERNEL_RWX
2019-05-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds23-65/+192
Pull KVM fixes from Paolo Bonzini: - PPC and ARM bugfixes from submaintainers - Fix old Windows versions on AMD (recent regression) - Fix old Linux versions on processors without EPT - Fixes for LAPIC timer optimizations * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: nVMX: Fix size checks in vmx_set_nested_state KVM: selftests: make hyperv_cpuid test pass on AMD KVM: lapic: Check for in-kernel LAPIC before deferencing apic pointer KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size x86/kvm/mmu: reset MMU context when 32-bit guest switches PAE KVM: x86: Whitelist port 0x7e for pre-incrementing %rip Documentation: kvm: fix dirty log ioctl arch lists KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit KVM: arm/arm64: Don't emulate virtual timers on userspace ioctls kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP KVM: arm/arm64: Ensure vcpu target is unset on reset failure KVM: lapic: Convert guest TSC to host time domain if necessary KVM: lapic: Allow user to disable adaptive tuning of timer advancement KVM: lapic: Track lapic timer advance per vCPU KVM: lapic: Disable timer advancement if adaptive tuning goes haywire x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012 KVM: x86: Consider LAPIC TSC-Deadline timer expired if deadline too short KVM: PPC: Book3S: Protect memslots while validating user address KVM: PPC: Book3S HV: Perserve PSSCR FAKE_SUSPEND bit on guest exit KVM: arm/arm64: vgic-v3: Retire pending interrupts on disabling LPIs ...
2019-05-03Merge branch 'i2c/for-current-fixed' of ↵Linus Torvalds5-5/+10
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "I2C driver bugfixes and a MAINTAINERS update for you" * 'i2c/for-current-fixed' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: Prevent runtime suspend of adapter when Host Notify is required i2c: synquacer: fix enumeration of slave devices MAINTAINERS: friendly takeover of i2c-gpio driver i2c: designware: ratelimit 'transfer when suspended' errors i2c: imx: correct the method of getting private data in notifier_call
2019-05-03Merge tag 'drm-fixes-2019-05-03' of git://anongit.freedesktop.org/drm/drmLinus Torvalds2-0/+16
Pull drm fix from Dave Airlie: "Just a single qxl revert" * tag 'drm-fixes-2019-05-03' of git://anongit.freedesktop.org/drm/drm: Revert "drm/qxl: drop prime import/export callbacks"
2019-05-03Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds3-5/+40
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "Two fixes for the NKMP clks on Allwinner SoCs, a locking fix for clkdev where we forgot to hold a lock while iterating a list that can change, and finally a build fix that adds some stubs for clk APIs that are used by devfreq drivers on platforms without the clk APIs" * tag 'clk-fixes-for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: Add missing stubs for a few functions clkdev: Hold clocks_mutex while iterating clocks list clk: sunxi-ng: nkmp: Explain why zero width check is needed clk: sunxi-ng: nkmp: Avoid GENMASK(-1, 0)
2019-05-03Merge tag 'sound-5.1' of ↵Linus Torvalds4-40/+78
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A few stable fixes at this round. The USB Line6 audio fixes are a bit large, but they are rather trivial and pretty much device-specific, so should be safe to apply at this late stage. Ditto for other HD-audio quirks" * tag 'sound-5.1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek - Apply the fixup for ASUS Q325UAR ALSA: line6: use dynamic buffers ALSA: hda/realtek - Fixed Dell AIO speaker noise ALSA: hda/realtek - Add new Dell platform for headset mode
2019-05-03perf/x86/intel/pt: Remove software double buffering PMU capabilityAlexander Shishkin2-3/+1
Now that all AUX allocations are high-order by default, the software double buffering PMU capability doesn't make sense any more, get rid of it. In case some PMUs choose to opt out, we can re-introduce it. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: adrian.hunter@intel.com Link: http://lkml.kernel.org/r/20190503085536.24119-3-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-03perf/ring_buffer: Fix AUX software double bufferingAlexander Shishkin1-2/+1
This recent commit: 5768402fd9c6e87 ("perf/ring_buffer: Use high order allocations for AUX buffers optimistically") overlooked the fact that the previous one page granularity of the AUX buffer provided an implicit double buffering capability to the PMU driver, which went away when the entire buffer became one high-order page. Always make the full-trace mode AUX allocation at least two-part to preserve the previous behavior and allow the implicit double buffering to continue. Reported-by: Ammy Yi <ammy.yi@intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: adrian.hunter@intel.com Fixes: 5768402fd9c6e87 ("perf/ring_buffer: Use high order allocations for AUX buffers optimistically") Link: http://lkml.kernel.org/r/20190503085536.24119-2-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-03Merge tag 'perf-urgent-for-mingo-5.1-20190502' of ↵Ingo Molnar12-21/+154
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: tools UAPI: Arnaldo Carvalho de Melo: - Sync x86's vmx.h with the kernel. - Copy missing unistd.h headers for arc, hexagon and riscv, fixing a reported build regression on the ARC 32-bit architecture. perf bench numa: Arnaldo Carvalho de Melo: - Add define for RUSAGE_THREAD if not present, fixing the build on the ARC architecture when only zlib and libnuma are present. perf BPF: Arnaldo Carvalho de Melo: - The disassembler-four-args feature test needs -ldl on distros such as Mageia 7. Bo YU: - Fix unlocking on success in perf_env__find_btf(), detected with the coverity tool. libtraceevent: Leo Yan: - Change misleading hard coded 'trace-cmd' string in error messages. ARM hardware tracing: Leo Yan: - Always allocate memory for cs_etm_queue::prev_packet, fixing a segfault when processing CoreSight perf data. perf annotate: Thadeu Lima de Souza Cascardo: - Fix build on 32 bit for BPF. perf report: Thomas Richter: - Report OOM in status line in the GTK UI. core libs: - Remove needless asm/unistd.h that, used with sys/syscall.h ended up redefining the syscalls defines in environments such as the ARC arch when using uClibc. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-03Merge tag 'drm-misc-fixes-2019-05-02' of ↵Dave Airlie2-0/+16
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes - One revert for QXL for a DRI3 breakage Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime.ripard@bootlin.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190502122529.hguztj3kncaixe3d@flea
2019-05-02perf tools: Remove needless asm/unistd.h include fixing build in some placesArnaldo Carvalho de Melo1-1/+0
We were including sys/syscall.h and asm/unistd.h, since sys/syscall.h includes asm/unistd.h, sometimes this leads to the redefinition of defines, breaking the build. Noticed on ARC with uCLibc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com> Link: https://lkml.kernel.org/n/tip-xjpf80o64i2ko74aj2jih0qg@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-02tools arch uapi: Copy missing unistd.h headers for arc, hexagon and riscvArnaldo Carvalho de Melo3-0/+133
Since those were introduced in: c8ce48f06503 ("asm-generic: Make time32 syscall numbers optional") But when the asm-generic/unistd.h was sync'ed with tools/ in: 1a787fc5ba18 ("tools headers uapi: Sync copy of asm-generic/unistd.h with the kernel sources") I forgot to copy the files for the architectures that define __ARCH_WANT_TIME32_SYSCALLS, so the perf build was breaking there, as reported by Vineet Gupta for the ARC architecture. After updating my ARC container to use the glibc based toolchain + cross building libnuma, zlib and elfutils, I finally managed to reproduce the problem and verify that this now is fixed and will not regress as will be tested before each pull req sent upstream. Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jiri Olsa <jolsa@kernel.org> CC: linux-snps-arc@lists.infradead.org Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/r/20190426193531.GC28586@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-02tools build: Add -ldl to the disassembler-four-args feature testArnaldo Carvalho de Melo1-1/+1
Thomas Backlund reported that the perf build was failing on the Mageia 7 distro, that is because it uses: cat /tmp/build/perf/feature/test-disassembler-four-args.make.output /usr/bin/ld: /usr/lib64/libbfd.a(plugin.o): in function `try_load_plugin': /home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:243: undefined reference to `dlopen' /usr/bin/ld: /home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:271: undefined reference to `dlsym' /usr/bin/ld: /home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:256: undefined reference to `dlclose' /usr/bin/ld: /home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:246: undefined reference to `dlerror' as we allow dynamic linking and loading Mageia 7 uses these linker flags: $ rpm --eval %ldflags  -Wl,--as-needed -Wl,--no-undefined -Wl,-z,relro -Wl,-O1 -Wl,--build-id -Wl,--enable-new-dtags So add -ldl to this feature LDFLAGS. Reported-by: Thomas Backlund <tmb@mageia.org> Tested-by: Thomas Backlund <tmb@mageia.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lkml.kernel.org/r/20190501173158.GC21436@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-02perf cs-etm: Always allocate memory for cs_etm_queue::prev_packetLeo Yan1-5/+3
Robert Walker reported a segmentation fault is observed when process CoreSight trace data; this issue can be easily reproduced by the command 'perf report --itrace=i1000i' for decoding tracing data. If neither the 'b' flag (synthesize branches events) nor 'l' flag (synthesize last branch entries) are specified to option '--itrace', cs_etm_queue::prev_packet will not been initialised. After merging the code to support exception packets and sample flags, there introduced a number of uses of cs_etm_queue::prev_packet without checking whether it is valid, for these cases any accessing to uninitialised prev_packet will cause crash. As cs_etm_queue::prev_packet is used more widely now and it's already hard to follow which functions have been called in a context where the validity of cs_etm_queue::prev_packet has been checked, this patch always allocates memory for cs_etm_queue::prev_packet. Reported-by: Robert Walker <robert.walker@arm.com> Suggested-by: Robert Walker <robert.walker@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Robert Walker <robert.walker@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Fixes: 7100b12cf474 ("perf cs-etm: Generate branch sample for exception packet") Fixes: 24fff5eb2b93 ("perf cs-etm: Avoid stale branch samples when flush packet") Link: http://lkml.kernel.org/r/20190428083228.20246-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-02perf cs-etm: Don't check cs_etm_queue::prev_packet validityLeo Yan1-5/+1
Since cs_etm_queue::prev_packet is allocated for all cases, it will never be NULL pointer; now validity checking prev_packet is pointless, remove all of them. Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Robert Walker <robert.walker@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190428083228.20246-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-02perf report: Report OOM in status line in the GTK UIThomas Richter1-3/+5
An -ENOMEM error is not reported in the GTK GUI. Instead this error message pops up on the screen: [root@m35lp76 perf]# ./perf report -i perf.data.error68-1 Processing events... [974K/3M] Error:failed to process sample 0xf4198 [0x8]: failed to process type: 68 However when I use the same perf.data file with --stdio it works: [root@m35lp76 perf]# ./perf report -i perf.data.error68-1 --stdio \ | head -12 # Total Lost Samples: 0 # # Samples: 76K of event 'cycles' # Event count (approx.): 99056160000 # # Overhead Command Shared Object Symbol # ........ ............... ................. ......... # 8.81% find [kernel.kallsyms] [k] ftrace_likely_update 8.74% swapper [kernel.kallsyms] [k] ftrace_likely_update 8.34% sshd [kernel.kallsyms] [k] ftrace_likely_update 2.19% kworker/u512:1- [kernel.kallsyms] [k] ftrace_likely_update The sample precentage is a bit low..... The GUI always fails in the FINISHED_ROUND event (68) and does not indicate the reason why. When happened is the following. Perf report calls a lot of functions and down deep when a FINISHED_ROUND event is processed, these functions are called: perf_session__process_event() + perf_session__process_user_event() + process_finished_round() + ordered_events__flush() + __ordered_events__flush() + do_flush() + ordered_events__deliver_event() + perf_session__deliver_event() + machine__deliver_event() + perf_evlist__deliver_event() + process_sample_event() + hist_entry_iter_add() --> only called in GUI case!!! + hist_iter__report__callback() + symbol__inc_addr_sample() Now this functions runs out of memory and returns -ENOMEM. This is reported all the way up until function perf_session__process_event() returns to its caller, where -ENOMEM is changed to -EINVAL and processing stops: if ((skip = perf_session__process_event(session, event, head)) < 0) { pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n", head, event->header.size, event->header.type); err = -EINVAL; goto out_err; } This occurred in the FINISHED_ROUND event when it has to process some 10000 entries and ran out of memory. This patch indicates the root cause and displays it in the status line of ther perf report GUI. Output before (on GUI status line): 0xf4198 [0x8]: failed to process type: 68 Output after: 0xf4198 [0x8]: failed to process type: 68 [not enough memory] Committer notes: the 'skip' variable needs to be initialized to -EINVAL, so that when the size is less than sizeof(struct perf_event_attr) we avoid this valid compiler warning: util/session.c: In function ‘perf_session__process_events’: util/session.c:1936:7: error: ‘skip’ may be used uninitialized in this function [-Werror=maybe-uninitialized] err = skip; ~~~~^~~~~~ util/session.c:1874:6: note: ‘skip’ was declared here s64 skip; ^~~~ cc1: all warnings being treated as errors Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20190423105303.61683-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-02perf bench numa: Add define for RUSAGE_THREAD if not presentArnaldo Carvalho de Melo1-0/+4
While cross building perf to the ARC architecture on a fedora 30 host, we were failing with: CC /tmp/build/perf/bench/numa.o bench/numa.c: In function ‘worker_thread’: bench/numa.c:1261:12: error: ‘RUSAGE_THREAD’ undeclared (first use in this function); did you mean ‘SIGEV_THREAD’? getrusage(RUSAGE_THREAD, &rusage); ^~~~~~~~~~~~~ SIGEV_THREAD bench/numa.c:1261:12: note: each undeclared identifier is reported only once for each function it appears in [perfbuilder@60d5802468f6 perf]$ /arc_gnu_2019.03-rc1_prebuilt_uclibc_le_archs_linux_install/bin/arc-linux-gcc --version | head -1 arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 [perfbuilder@60d5802468f6 perf]$ Trying to reproduce a report by Vineet, I noticed that, with just cross-built zlib and numactl libraries, I ended up with the above failure. So, since RUSAGE_THREAD is available as a define, check for that and numactl libraries, I ended up with the above failure. So, since RUSAGE_THREAD is available as a define in the system headers, check if it is defined in the 'perf bench numa' sources and define it if not. Now it builds and I have to figure out if the problem reported by Vineet only takes place if we have libelf or some other library available. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: linux-snps-arc@lists.infradead.org Cc: Namhyung Kim <namhyung@kernel.org> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com> Link: https://lkml.kernel.org/n/tip-2wb4r1gir9xrevbpq7qp0amk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>