summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2015-04-14Merge branch 'perf-core-for-linus' of ↵Linus Torvalds11-180/+1226
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Core kernel changes: - One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. (Right now it's limited to root-only, but in the future we might allow unprivileged use as well.) (Alexei Starovoitov) - Another non-trivial feature is per event clockid support: this allows, amongst other things, the selection of different clock sources for event timestamps traced via perf. This feature is sought by people who'd like to merge perf generated events with external events that were measured with different clocks: - cluster wide profiling - for system wide tracing with user-space events, - JIT profiling events etc. Matching perf tooling support is added as well, available via the -k, --clockid <clockid> parameter to perf record et al. (Peter Zijlstra) Hardware enablement kernel changes: - x86 Intel Processor Trace (PT) support: which is a hardware tracer on steroids, available on Broadwell CPUs. The hardware trace stream is directly output into the user-space ring-buffer, using the 'AUX' data format extension that was added to the perf core to support hardware constraints such as the necessity to have the tracing buffer physically contiguous. This patch-set was developed for two years and this is the result. A simple way to make use of this is to use BTS tracing, the PT driver emulates BTS output - available via the 'intel_bts' PMU. More explicit PT specific tooling support is in the works as well - will probably be ready by 4.2. (Alexander Shishkin, Peter Zijlstra) - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware feature of Intel Xeon CPUs that allows the measurement and allocation/partitioning of caches to individual workloads. These kernel changes expose the measurement side as a new PMU driver, which exposes various QoS related PMU events. (The partitioning change is work in progress and is planned to be merged as a cgroup extension.) (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P Waskiewicz Jr) - x86 Intel Haswell LBR call stack support: this is a new Haswell feature that allows the hardware recording of call chains, plus tooling support. To activate this feature you have to enable it via the new 'lbr' call-graph recording option: perf record --call-graph lbr perf report or: perf top --call-graph lbr This hardware feature is a lot faster than stack walk or dwarf based unwinding, but has some limitations: - It reuses the current LBR facility, so LBR call stack and branch record can not be enabled at the same time. - It is only available for user-space callchains. (Yan, Zheng) - x86 Intel Broadwell CPU support and various event constraints and event table fixes for earlier models. (Andi Kleen) - x86 Intel HT CPUs event scheduling workarounds. This is a complex CPU bug affecting the SNB,IVB,HSW families that results in counter value corruption. The mitigation code is automatically enabled and is transparent. (Maria Dimakopoulou, Stephane Eranian) The perf tooling side had a ton of changes in this cycle as well, so I'm only able to list the user visible changes here, in addition to the tooling changes outlined above: User visible changes affecting all tools: - Improve support of compressed kernel modules (Jiri Olsa) - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo) - Bash completion for subcommands (Yunlong Song) - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa) - Support missing -f to override perf.data file ownership. (Yunlong Song) - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo) User visible changes in individual tools: 'perf data': New tool for converting perf.data to other formats, initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior) 'perf diff': Add --kallsyms option (David Ahern) 'perf list': Allow listing events with 'tracepoint' prefix (Yunlong Song) Sort the output of the command (Yunlong Song) 'perf kmem': Respect -i option (Jiri Olsa) Print big numbers using thousands' group (Namhyung Kim) Allow -v option (Namhyung Kim) Fix alignment of slab result table (Namhyung Kim) 'perf probe': Support multiple probes on different binaries on the same command line (Masami Hiramatsu) Support unnamed union/structure members data collection. (Masami Hiramatsu) Check kprobes blacklist when adding new events. (Masami Hiramatsu) 'perf record': Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra) Support recording running/enabled time (Andi Kleen) 'perf sched': Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song) 'perf report' and 'perf top': Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo) Indicate which callchain entries are annotated in the TUI hists browser (Arnaldo Carvalho de Melo) Add pid/tid filtering to 'report' and 'script' commands (David Ahern) Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT events (Arnaldo Carvalho de Melo) 'perf stat': Report unsupported events properly (Suzuki K. Poulose) Output running time and run/enabled ratio in CSV mode (Andi Kleen) 'perf trace': Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo) Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo) Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo) Dump stack on segfaults (Arnaldo Carvalho de Melo) No need to explicitely enable evsels for workload started from perf, let it be enabled via perf_event_attr.enable_on_exec, removing some events that take place in the 'perf trace' before a workload is really started by it. (Arnaldo Carvalho de Melo) Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo) There's also been a ton of infrastructure work done, such as the split-out of perf's build system into tools/build/ and other changes - see the shortlog and changelog for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits) perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init() perf evlist: Fix type for references to data_head/tail perf probe: Check the orphaned -x option perf probe: Support multiple probes on different binaries perf buildid-list: Fix segfault when show DSOs with hits perf tools: Fix cross-endian analysis perf tools: Fix error path to do closedir() when synthesizing threads perf tools: Fix synthesizing fork_event.ppid for non-main thread perf tools: Add 'I' event modifier for exclude_idle bit perf report: Don't call map__kmap if map is NULL. perf tests: Fix attr tests perf probe: Fix ARM 32 building error perf tools: Merge all perf_event_attr print functions perf record: Add clockid parameter perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 perf sched replay: Support using -f to override perf.data file ownership perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task perf sched replay: Fix the segmentation fault problem caused by pr_err in threads perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations ...
2015-04-14Merge branch 'timers-nohz-for-linus' of ↵Linus Torvalds2-21/+40
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull NOHZ changes from Ingo Molnar: "This tree adds full dynticks support to KVM guests (support the disabling of the timer tick on the guest). The main missing piece was the recognition of guest execution as RCU extended quiescent state and related changes" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kvm,rcu,nohz: use RCU extended quiescent state when running KVM guest context_tracking: Export context_tracking_user_enter/exit context_tracking: Run vtime_user_enter/exit only when state == CONTEXT_USER context_tracking: Add stub context_tracking_is_enabled context_tracking: Generalize context tracking APIs to support user and guest context_tracking: Rename context symbols to prepare for transition state ppc: Remove unused cpp symbols in kvm headers
2015-04-14Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds11-311/+743
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU changes from Ingo Molnar: "The main changes in this cycle were: - changes permitting use of call_rcu() and friends very early in boot, for example, before rcu_init() is invoked. - add in-kernel API to enable and disable expediting of normal RCU grace periods. - improve RCU's handling of (hotplug-) outgoing CPUs. - NO_HZ_FULL_SYSIDLE fixes. - tiny-RCU updates to make it more tiny. - documentation updates. - miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well cpu: Defer smpboot kthread unparking until CPU known to scheduler rcu: Associate quiescent-state reports with grace period rcu: Yet another fix for preemption and CPU hotplug rcu: Add diagnostics to grace-period cleanup rcutorture: Default to grace-period-initialization delays rcu: Handle outgoing CPUs on exit from idle loop cpu: Make CPU-offline idle-loop transition point more precise rcu: Eliminate ->onoff_mutex from rcu_node structure rcu: Process offlining and onlining only at grace-period start rcu: Move rcu_report_unblock_qs_rnp() to common code rcu: Rework preemptible expedited bitmask handling rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs rcutorture: Enable slow grace-period initializations rcu: Provide diagnostic option to slow down grace-period initialization rcu: Detect stalls caused by failure to propagate up rcu_node tree rcu: Eliminate empty HOTPLUG_CPU ifdef rcu: Simplify sync_rcu_preempt_exp_init() rcu: Put all orphan-callback-related code under same comment rcu: Consolidate offline-CPU callback initialization ...
2015-04-14Merge tag 'trace-v4.1' of ↵Linus Torvalds13-45/+492
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Some clean ups and small fixes, but the biggest change is the addition of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints. Tracepoints have helper functions for the TP_printk() called __print_symbolic() and __print_flags() that lets a numeric number be displayed as a a human comprehensible text. What is placed in the TP_printk() is also shown in the tracepoint format file such that user space tools like perf and trace-cmd can parse the binary data and express the values too. Unfortunately, the way the TRACE_EVENT() macro works, anything placed in the TP_printk() will be shown pretty much exactly as is. The problem arises when enums are used. That's because unlike macros, enums will not be changed into their values by the C pre-processor. Thus, the enum string is exported to the format file, and this makes it useless for user space tools. The TRACE_DEFINE_ENUM() solves this by converting the enum strings in the TP_printk() format into their number, and that is what is shown to user space. For example, the tracepoint tlb_flush currently has this in its format file: __print_symbolic(REC->reason, { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" }, { TLB_REMOTE_SHOOTDOWN, "remote shootdown" }, { TLB_LOCAL_SHOOTDOWN, "local shootdown" }, { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" }) After adding: TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH); TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN); Its format file will contain this: __print_symbolic(REC->reason, { 0, "flush on task switch" }, { 1, "remote shootdown" }, { 2, "local shootdown" }, { 3, "local mm shootdown" })" * tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits) tracing: Add enum_map file to show enums that have been mapped writeback: Export enums used by tracepoint to user space v4l: Export enums used by tracepoints to user space SUNRPC: Export enums in tracepoints to user space mm: tracing: Export enums in tracepoints to user space irq/tracing: Export enums in tracepoints to user space f2fs: Export the enums in the tracepoints to userspace net/9p/tracing: Export enums in tracepoints to userspace x86/tlb/trace: Export enums in used by tlb_flush tracepoint tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM() tracing: Allow for modules to convert their enums to values tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation tracing: Give system name a pointer brcmsmac: Move each system tracepoints to their own header iwlwifi: Move each system tracepoints to their own header mac80211: Move message tracepoints to their own header tracing: Add TRACE_SYSTEM_VAR to xhci-hcd tracing: Add TRACE_SYSTEM_VAR to kvm-s390 tracing: Add TRACE_SYSTEM_VAR to intel-sst ...
2015-04-14Merge tag 'trace-4.1-tracefs' of ↵Linus Torvalds8-142/+130
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracefs from Steven Rostedt: "This adds the new tracefs file system. This has been in linux-next for more than one release, as I had it ready for the 4.0 merge window, but a last minute thing that needed to go into Linux first had to be done. That was that perf hard coded the file system number when reading /sys/kernel/debugfs/tracing directory making sure that the path had the debugfs mount # before it would parse the tracing file. This broke other use cases of perf, and the check is removed. Now when mounting /sys/kernel/debug, tracefs is automatically mounted in /sys/kernel/debug/tracing such that old tools will still see that path as expected. But now system admins can mount tracefs directly and not need to mount debugfs, which can expose security issues. A new directory is created when tracefs is configured such that system admins can now mount it separately (/sys/kernel/tracing)" * tag 'trace-4.1-tracefs' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Have mkdir and rmdir be part of tracefs tracefs: Add directory /sys/kernel/tracing tracing: Automatically mount tracefs on debugfs/tracing tracing: Convert the tracing facility over to use tracefs tracefs: Add new tracefs file system tracing: Create cmdline tracer options on tracing fs init tracing: Only create tracer options files if directory exists debugfs: Provide a file creation function that also takes an initial size
2015-04-14Merge branch 'for-linus' of ↵Linus Torvalds1-52/+17
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching updates from Jiri Kosina: "These are mostly smaller things that got accumulated during the development cycle. The unified solution is still being worked on and is not mature enough for 4.1 yet. - s390 livepatching support, from Jiri Slaby (has Ack from s390 maintainers) - error handling simplification, from Josh Poimboeuf - two minor code cleanups from Josh Poimboeuf and Miroslav Benes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: add support on s390 livepatch: remove unnecessary call to klp_find_object_module() livepatch: simplify disable error path livepatch: remove extern specifier from header files
2015-04-13Merge branch 'for-4.1' of ↵Linus Torvalds3-9/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Nothing too interesting. Rik made cpuset cooperate better with isolcpus and there are several other cleanup patches" * 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset, isolcpus: document relationship between cpusets & isolcpus cpusets, isolcpus: exclude isolcpus from load balancing in cpusets sched, isolcpu: make cpu_isolated_map visible outside scheduler cpuset: initialize cpuset a bit early cgroup: Use kvfree in pidlist_free() cgroup: call cgroup_subsys->bind on cgroup subsys initialization
2015-04-13Merge branch 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-333/+514
Pull workqueue updates from Tejun Heo: "Workqueue now prints debug information at the end of sysrq-t which should be helpful when tracking down suspected workqueue stalls. It only prints out the ones with something currently going on so it shouldn't add much output in most cases" * 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Reorder sysfs code percpu: Fix trivial typos in comments workqueue: dump workqueues on sysrq-t workqueue: keep track of the flushing task and pool manager workqueue: make the workqueues list RCU walkable
2015-04-13Merge branch 'irq-core-for-linus' of ↵Linus Torvalds3-4/+150
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq core updates from Thomas Gleixner: "Managerial summary: Core code: - final removal of IRQF_DISABLED - new state save/restore functions for virtualization support - wakeup support for stacked irqdomains - new function to solve the netpoll synchronization problem irqchips: - new driver for STi based devices - new driver for Vybrid MSCM - massive cleanup of the GIC driver by moving the GIC-addons to stacked irqdomains - the usual pile of fixes and updates to the various chip drivers" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) irqchip: GICv3: Add support for irq_[get, set]_irqchip_state() irqchip: GIC: Add support for irq_[get, set]_irqchip_state() genirq: Allow the irqchip state of an IRQ to be save/restored genirq: MSI: Fix freeing of unallocated MSI irqchip: renesas-irqc: Add wake-up support irqchip: armada-370-xp: Allow using wakeup source irqchip: mips-gic: Add new functions to start/stop the GIC counter irqchip: tegra: Add Tegra210 support irqchip: digicolor: Move digicolor_set_gc to init section irqchip: renesas-irqc: Add functional clock to bindings irqchip: renesas-irqc: Add minimal runtime PM support irqchip: renesas-irqc: Add more register documentation DT: exynos: update PMU binding ARM: exynos4/5: convert pmu wakeup to stacked domains irqchip: gic: Don't complain in gic_get_cpumask() if UP system ARM: zynq: switch from gic_arch_extn to gic_set_irqchip_flags ARM: ux500: switch from gic_arch_extn to gic_set_irqchip_flags ARM: shmobile: remove use of gic_arch_extn.irq_set_wake irqchip: gic: Add an entry point to set up irqchip flags ARM: omap: convert wakeupgen to stacked domains ...
2015-04-13Merge branch 'for-4.1/core-noarch' into for-linusJiri Kosina1-52/+17
2015-04-13Merge branch 'timers-core-for-linus' of ↵Linus Torvalds20-755/+1174
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Ingo Molnar: "The main changes in this cycle were: - clockevents state machine cleanups and enhancements (Viresh Kumar) - clockevents broadcast notifier horror to state machine conversion and related cleanups (Thomas Gleixner, Rafael J Wysocki) - clocksource and timekeeping core updates (John Stultz) - clocksource driver updates and fixes (Ben Dooks, Dmitry Osipenko, Hans de Goede, Laurent Pinchart, Maxime Ripard, Xunlei Pang) - y2038 fixes (Xunlei Pang, John Stultz) - NMI-safe ktime_get_raw_fast() and general refactoring of the clock code, in preparation to perf's per event clock ID support (Peter Zijlstra) - generic sched/clock fixes, optimizations and cleanups (Daniel Thompson) - clockevents cpu_down() race fix (Preeti U Murthy)" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (94 commits) timers/PM: Drop unnecessary braces from tick_freeze() timers/PM: Fix up tick_unfreeze() timekeeping: Get rid of stale comment clockevents: Cleanup dead cpu explicitely clockevents: Make tick handover explicit clockevents: Remove broadcast oneshot control leftovers sched/idle: Use explicit broadcast oneshot control function ARM: Tegra: Use explicit broadcast oneshot control function ARM: OMAP: Use explicit broadcast oneshot control function intel_idle: Use explicit broadcast oneshot control function ACPI/idle: Use explicit broadcast control function ACPI/PAD: Use explicit broadcast oneshot control function x86/amd/idle, clockevents: Use explicit broadcast oneshot control functions clockevents: Provide explicit broadcast oneshot control functions clockevents: Remove the broadcast control leftovers ARM: OMAP: Use explicit broadcast control function intel_idle: Use explicit broadcast control function cpuidle: Use explicit broadcast control function ACPI/processor: Use explicit broadcast control function ACPI/PAD: Use explicit broadcast control function ...
2015-04-13Merge branch 'sched-core-for-linus' of ↵Linus Torvalds7-230/+612
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "Major changes: - Reworked CPU capacity code, for better SMP load balancing on systems with assymetric CPUs. (Vincent Guittot, Morten Rasmussen) - Reworked RT task SMP balancing to be push based instead of pull based, to reduce latencies on large CPU count systems. (Steven Rostedt) - SCHED_DEADLINE support updates and fixes. (Juri Lelli) - SCHED_DEADLINE task migration support during CPU hotplug. (Wanpeng Li) - x86 mwait-idle optimizations and fixes. (Mike Galbraith, Len Brown) - sched/numa improvements. (Rik van Riel) - various cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) sched/core: Drop debugging leftover trace_printk call sched/deadline: Support DL task migration during CPU hotplug sched/core: Check for available DL bandwidth in cpuset_cpu_inactive() sched/deadline: Always enqueue on previous rq when dl_task_timer() fires sched/core: Remove unused argument from init_[rt|dl]_rq() sched/deadline: Fix rt runtime corruption when dl fails its global constraints sched/deadline: Avoid a superfluous check sched: Improve load balancing in the presence of idle CPUs sched: Optimize freq invariant accounting sched: Move CFS tasks to CPUs with higher capacity sched: Add SD_PREFER_SIBLING for SMT level sched: Remove unused struct sched_group_capacity::capacity_orig sched: Replace capacity_factor by usage sched: Calculate CPU's usage statistic and put it into struct sg_lb_stats::group_usage sched: Add struct rq::cpu_capacity_orig sched: Make scale_rt invariant with frequency sched: Make sched entity usage tracking scale-invariant sched: Remove frequency scaling from cpu_capacity sched: Track group sched_entity usage contributions sched: Add sched_avg::utilization_avg_contrib ...
2015-04-13Merge branch 'locking-core-for-linus' of ↵Linus Torvalds9-103/+119
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: "Main changes: - jump label asm preparatory work for PowerPC (Anton Blanchard) - rwsem optimizations and cleanups (Davidlohr Bueso) - mutex optimizations and cleanups (Jason Low) - futex fix (Oleg Nesterov) - remove broken atomicity checks from {READ,WRITE}_ONCE() (Peter Zijlstra)" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: powerpc, jump_label: Include linux/jump_label.h to get HAVE_JUMP_LABEL define jump_label: Allow jump labels to be used in assembly jump_label: Allow asm/jump_label.h to be included in assembly locking/mutex: Further simplify mutex_spin_on_owner() locking: Remove atomicy checks from {READ,WRITE}_ONCE locking/rtmutex: Rename argument in the rt_mutex_adjust_prio_chain() documentation as well locking/rwsem: Fix lock optimistic spinning when owner is not running locking: Remove ACCESS_ONCE() usage locking/rwsem: Check for active lock before bailing on spinning locking/rwsem: Avoid deceiving lock spinners locking/rwsem: Set lock ownership ASAP locking/rwsem: Document barrier need when waking tasks locking/futex: Check PF_KTHREAD rather than !p->mm to filter out kthreads locking/mutex: Refactor mutex_spin_on_owner() locking/mutex: In mutex_spin_on_owner(), return true when owner changes
2015-04-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+15
Pull KVM updates from Paolo Bonzini: "First batch of KVM changes for 4.1 The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64. Summary: ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling vhost, too), page aging s390: interrupt handling rework, allowing to inject all local interrupts via new ioctl and to get/set the full local irq state for migration and introspection. New ioctls to access memory by virtual address, and to get/set the guest storage keys. SIMD support. MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches from Ralf Baechle's MIPS tree. x86: bugfixes (notably for pvclock, the others are small) and cleanups. Another small latency improvement for the TSC deadline timer" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) KVM: use slowpath for cross page cached accesses kvm: mmu: lazy collapse small sptes into large sptes KVM: x86: Clear CR2 on VCPU reset KVM: x86: DR0-DR3 are not clear on reset KVM: x86: BSP in MSR_IA32_APICBASE is writable KVM: x86: simplify kvm_apic_map KVM: x86: avoid logical_map when it is invalid KVM: x86: fix mixed APIC mode broadcast KVM: x86: use MDA for interrupt matching kvm/ppc/mpic: drop unused IRQ_testbit KVM: nVMX: remove unnecessary double caching of MAXPHYADDR KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu KVM: vmx: pass error code with internal error #2 x86: vdso: fix pvclock races with task migration KVM: remove kvm_read_hva and kvm_read_hva_atomic KVM: x86: optimize delivery of TSC deadline timer interrupt KVM: x86: extract blocking logic from __vcpu_run kvm: x86: fix x86 eflags fixed bit KVM: s390: migrate vcpu interrupt state ...
2015-04-13cpu: Defer smpboot kthread unparking until CPU known to schedulerPaul E. McKenney1-3/+31
Currently, smpboot_unpark_threads() is invoked before the incoming CPU has been added to the scheduler's runqueue structures. This might potentially cause the unparked kthread to run on the wrong CPU, since the correct CPU isn't fully set up yet. That causes a sporadic, hard to debug boot crash triggering on some systems, reported by Borislav Petkov, and bisected down to: 2a442c9c6453 ("x86: Use common outgoing-CPU-notification code") This patch places smpboot_unpark_threads() in a CPU hotplug notifier with priority set so that these kthreads are unparked just after the CPU has been added to the runqueues. Reported-and-tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-11Merge tag 'irqchip-core-4.1-3' of ↵Thomas Gleixner1-0/+16
git://git.infradead.org/users/jcooper/linux into irq/core irqchip core change for v4.1 (round 3) from Jason Cooper Purge the gic_arch_extn hacks and abuse by using the new stacked domains NOTE: Due to the nature of these changes, patches crossing subsystems have been kept together in their own branches. - tegra - Handle the LIC properly - omap - Convert crossbar to stacked domains - kill arm,routable-irqs in GIC binding - exynos - Convert PMU wakeup to stacked domains - shmobile, ux500, zynq (irq_set_wake branch) - Switch from abusing gic_arch_extn to using gic_set_irqchip_flags
2015-04-09Merge tag 'pm+acpi-4.0-rc8' of ↵Linus Torvalds1-20/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "These are stable-candidate fixes of some recently reported issues in the cpufreq core, cpuidle core, the ACPI cpuidle driver and the hibernate core. Specifics: - Revert a 3.17 hibernate commit that was supposed to fix an issue related to e820 reserved regions, but broke resume from hibernation on Lenovo x230 (Rafael J Wysocki). - Prevent the ACPI cpuidle driver from overwriting the name and description of the C0 state set by the core when the list of C-states changes (Thomas Schlichter). - Remove the no longer needed state_count field from struct cpuidle_device which prevents the list of C-states shown by the sysfs interface from becoming incorrect when the current number of them is different from the number of C-states on boot (Bartlomiej Zolnierkiewicz). - The cpufreq core updates the policy object of the only online CPU during system resume to make it reflect the current hardware state, but it always assumes that CPU to be CPU0 which need not be the case, so fix the code to avoid that assumption (Viresh Kumar)" * tag 'pm+acpi-4.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions" cpuidle: ACPI: do not overwrite name and description of C0 cpuidle: remove state_count field from struct cpuidle_device cpufreq: Schedule work for the first-online CPU on resume
2015-04-09Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'Rafael J. Wysocki1-20/+1
* pm-sleep: Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions" * pm-cpufreq: cpufreq: Schedule work for the first-online CPU on resume * pm-cpuidle: cpuidle: ACPI: do not overwrite name and description of C0 cpuidle: remove state_count field from struct cpuidle_device
2015-04-09locking/mutex: Further simplify mutex_spin_on_owner()Jason Low1-10/+4
Similar to what Linus suggested for rwsem_spin_on_owner(), in mutex_spin_on_owner() instead of having while (true) and breaking out of the spin loop on lock->owner != owner, we can have the loop directly check for while (lock->owner == owner) to improve the readability of the code. It also shrinks the code a bit: text data bss dec hex filename 3721 0 0 3721 e89 mutex.o.before 3705 0 0 3705 e79 mutex.o.after Signed-off-by: Jason Low <jason.low2@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1428521960-5268-2-git-send-email-jason.low2@hp.com [ Added code generation info. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+3
Merge misc fixes from Andrew Morton: "Three fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: numa: disable change protection for vma(VM_HUGETLB) include/linux/dmapool.h: declare struct device mm: move zone lock to a different cache line than order-0 free page lists
2015-04-08Copy the kernel module data from user space in chunksLinus Torvalds1-1/+18
Unlike most (all?) other copies from user space, kernel module loading is almost unlimited in size. So we do a potentially huge "copy_from_user()" when we copy the module data from user space to the kernel buffer, which can be a latency concern when preemption is disabled (or voluntary). Also, because 'copy_from_user()' clears the tail of the kernel buffer on failures, even a *failed* copy can end up wasting a lot of time. Normally neither of these are concerns in real life, but they do trigger when doing stress-testing with trinity. Running in a VM seems to add its own overheadm causing trinity module load testing to even trigger the watchdog. The simple fix is to just chunk up the module loading, so that it never tries to copy insanely big areas in one go. That bounds the latency, and also the amount of (unnecessarily, in this case) cleared memory for the failure case. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-08genirq: Allow the irqchip state of an IRQ to be save/restoredMarc Zyngier1-0/+91
There is a number of cases where a kernel subsystem may want to introspect the state of an interrupt at the irqchip level: - When a peripheral is shared between virtual machines, its interrupt state becomes part of the guest's state, and must be switched accordingly. KVM on arm/arm64 requires this for its guest-visible timer - Some GPIO controllers seem to require peeking into the interrupt controller they are connected to to report their internal state This seem to be a pattern that is common enough for the core code to try and support this without too many horrible hacks. Introduce a pair of accessors (irq_get_irqchip_state/irq_set_irqchip_state) to retrieve the bits that can be of interest to another subsystem: pending, active, and masked. - irq_get_irqchip_state returns the state of the interrupt according to a parameter set to IRQCHIP_STATE_PENDING, IRQCHIP_STATE_ACTIVE, IRQCHIP_STATE_MASKED or IRQCHIP_STATE_LINE_LEVEL. - irq_set_irqchip_state similarly sets the state of the interrupt. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Bjorn Andersson <bjorn.andersson@sonymobile.com> Tested-by: Bjorn Andersson <bjorn.andersson@sonymobile.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Abhijeet Dharmapurikar <adharmap@codeaurora.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Phong Vo <pvo@apm.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Tin Huynh <tnhuynh@apm.com> Cc: Y Vo <yvo@apm.com> Cc: Toan Le <toanle@apm.com> Cc: Bjorn Andersson <bjorn@kryo.se> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Arnd Bergmann <arnd@arndb.de> Link: http://lkml.kernel.org/r/1426676484-21812-2-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-08genirq: MSI: Fix freeing of unallocated MSIMarc Zyngier1-2/+9
While debugging an unrelated issue with the GICv3 ITS driver, the following trace triggered: WARNING: CPU: 1 PID: 1 at kernel/irq/irqdomain.c:1121 irq_domain_free_irqs+0x160/0x17c() NULL pointer, cannot free irq Modules linked in: CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 3.19.0-rc6+ #3690 Hardware name: FVP Base (DT) Call trace: [<ffffffc000089398>] dump_backtrace+0x0/0x13c [<ffffffc0000894e4>] show_stack+0x10/0x1c [<ffffffc00066d134>] dump_stack+0x74/0x94 [<ffffffc0000a92f8>] warn_slowpath_common+0x9c/0xd4 [<ffffffc0000a938c>] warn_slowpath_fmt+0x5c/0x80 [<ffffffc0000ee04c>] irq_domain_free_irqs+0x15c/0x17c [<ffffffc0000ef918>] msi_domain_free_irqs+0x58/0x74 [<ffffffc000386f58>] free_msi_irqs+0xb4/0x1c0 // The msi_prepare callback fails here [<ffffffc0003872c0>] pci_enable_msix+0x25c/0x3d4 [<ffffffc00038746c>] pci_enable_msix_range+0x34/0x80 [<ffffffc0003924ac>] vp_try_to_find_vqs+0xec/0x528 [<ffffffc000392954>] vp_find_vqs+0x6c/0xa8 [<ffffffc0003ee2a8>] init_vq+0x120/0x248 [<ffffffc0003eefb0>] virtblk_probe+0xb0/0x6bc [<ffffffc00038fc34>] virtio_dev_probe+0x17c/0x214 [<ffffffc0003d4a04>] driver_probe_device+0x7c/0x23c [<ffffffc0003d4cb0>] __driver_attach+0x98/0xa0 [<ffffffc0003d2c60>] bus_for_each_dev+0x60/0xb4 [<ffffffc0003d455c>] driver_attach+0x1c/0x28 [<ffffffc0003d41b0>] bus_add_driver+0x150/0x208 [<ffffffc0003d54c0>] driver_register+0x64/0x130 [<ffffffc00038f9e8>] register_virtio_driver+0x24/0x68 [<ffffffc00091320c>] init+0x70/0xac [<ffffffc0000828f0>] do_one_initcall+0x94/0x1d0 [<ffffffc0008e9b00>] kernel_init_freeable+0x144/0x1e4 [<ffffffc00066a434>] kernel_init+0xc/0xd8 ---[ end trace f9ee562a77cc7bae ]--- The ITS msi_prepare callback having failed, we end-up trying to free MSIs that have never been allocated. Oddly enough, the kernel is pretty upset about it. It turns out that this behaviour was expected before the MSI domain was introduced (and dealt with in arch_teardown_msi_irqs). The obvious fix is to detect this early enough and bail out. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com> Link: http://lkml.kernel.org/r/1422299419-6051-1-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-08Merge branch 'linus' into irq/core to get the GIC updates whichThomas Gleixner16-85/+258
conflict with pending GIC changes. Conflicts: drivers/usb/isp1760/isp1760-core.c
2015-04-08tracing: Add enum_map file to show enums that have been mappedSteven Rostedt (Red Hat)2-4/+269
Add a enum_map file in the tracing directory to see what enums have been saved to convert in the print fmt files. As this requires the enum mapping to be persistent in memory, it is only created if the new config option CONFIG_TRACE_ENUM_MAP_FILE is enabled. This is for debugging and will increase the persistent memory footprint of the kernel. Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-08tracing: Allow for modules to convert their enums to valuesSteven Rostedt (Red Hat)3-5/+49
Update the infrastructure such that modules that declare TRACE_DEFINE_ENUM() will have those enums converted into their values in the tracepoint print fmt strings. Link: http://lkml.kernel.org/r/87vbhjp74q.fsf@rustcorp.com.au Acked-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-08tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their valuesSteven Rostedt (Red Hat)3-1/+146
Several tracepoints use the helper functions __print_symbolic() or __print_flags() and pass in enums that do the mapping between the binary data stored and the value to print. This works well for reading the ASCII trace files, but when the data is read via userspace tools such as perf and trace-cmd, the conversion of the binary value to a human string format is lost if an enum is used, as userspace does not have access to what the ENUM is. For example, the tracepoint trace_tlb_flush() has: __print_symbolic(REC->reason, { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" }, { TLB_REMOTE_SHOOTDOWN, "remote shootdown" }, { TLB_LOCAL_SHOOTDOWN, "local shootdown" }, { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" }) Which maps the enum values to the strings they represent. But perf and trace-cmd do no know what value TLB_LOCAL_MM_SHOOTDOWN is, and would not be able to map it. With TRACE_DEFINE_ENUM(), developers can place these in the event header files and ftrace will convert the enums to their values: By adding: TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH); TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN); $ cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/format [...] __print_symbolic(REC->reason, { 0, "flush on task switch" }, { 1, "remote shootdown" }, { 2, "local shootdown" }, { 3, "local mm shootdown" }) The above is what userspace expects to see, and tools do not need to be modified to parse them. Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org Cc: Guilherme Cox <cox@computer.org> Cc: Tony Luck <tony.luck@gmail.com> Cc: Xie XiuQi <xiexiuqi@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-07mm: numa: disable change protection for vma(VM_HUGETLB)Naoya Horiguchi1-1/+3
Currently when a process accesses a hugetlb range protected with PROTNONE, unexpected COWs are triggered, which finally puts the hugetlb subsystem into a broken/uncontrollable state, where for example h->resv_huge_pages is subtracted too much and wraps around to a very large number, and the free hugepage pool is no longer maintainable. This patch simply stops changing protection for vma(VM_HUGETLB) to fix the problem. And this also allows us to avoid useless overhead of minor faults. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Suggested-by: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-07Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions"Rafael J. Wysocki1-20/+1
Commit 84c91b7ae07c (PM / hibernate: avoid unsafe pages in e820 reserved regions) is reported to make resume from hibernation on Lenovo x230 unreliable, so revert it. We will revisit the issue the commit in question was supposed to fix in the future. Link: https://bugzilla.kernel.org/show_bug.cgi?id=96111 Reported-by: rhn <kebuac.rhn@porcupinefactory.org> Cc: 3.17+ <stable@vger.kernel.org> # 3.17+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-06workqueue: Reorder sysfs codeFrederic Weisbecker1-317/+318
The sysfs code usually belongs to the botom of the file since it deals with high level objects. In the workqueue code it's misplaced and such that we'll need to work around functions references to allow the sysfs code to call APIs like apply_workqueue_attrs(). Lets move that block further in the file, almost the botom. And declare workqueue_sysfs_unregister() just before destroy_workqueue() which reference it. tj: Moved workqueue_sysfs_unregister() forward declaration where other forward declarations are. Suggested-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Mike Galbraith <bitbucket@online.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2015-04-03timers/PM: Drop unnecessary braces from tick_freeze()Rafael J. Wysocki1-3/+2
Some braces in tick_freeze() are not necessary, so drop them. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: peterz@infradead.org Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1534128.H5hN3KBFB4@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03timers/PM: Fix up tick_unfreeze()Rafael J. Wysocki1-1/+1
A recent conflict resolution has left tick_resume() in tick_unfreeze() which leads to an unbalanced execution of tick_resume_broadcast() every time that function runs. Fix that by replacing the tick_resume() in tick_unfreeze() with tick_resume_local() as appropriate. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: peterz@infradead.org Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8099075.V0LvN3pQAV@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03sched/core: Drop debugging leftover trace_printk callBorislav Petkov1-3/+1
Commit: 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()") forgot a trace_printk() debugging piece in and Steve's banner screamed in dmesg. Remove it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1428050570-21041-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03timekeeping: Get rid of stale commentThomas Gleixner1-4/+0
Arch specific management of xtime/jiffies/wall_to_monotonic is gone for quite a while. Zap the stale comment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/2422730.dmO29q661S@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03clockevents: Cleanup dead cpu explicitelyThomas Gleixner6-60/+50
clockevents_notify() is a leftover from the early design of the clockevents facility. It's really not a notification mechanism, it's a multiplex call. We are way better off to have explicit calls instead of this monstrosity. Split out the cleanup function for a dead cpu and invoke it directly from the cpu down code. Make it conditional on CPU_HOTPLUG as well. Temporary change, will be refined in the future. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Rebased, added clockevents_notify() removal ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1735025.raBZdQHM3m@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03clockevents: Make tick handover explicitThomas Gleixner5-12/+8
clockevents_notify() is a leftover from the early design of the clockevents facility. It's really not a notification mechanism, it's a multiplex call. We are way better off to have explicit calls instead of this monstrosity. Split out the tick_handover call and invoke it explicitely from the hotplug code. Temporary solution will be cleaned up in later patches. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1658173.RkEEILFiQZ@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03clockevents: Remove broadcast oneshot control leftoversRafael J. Wysocki1-7/+0
Now that all users are converted over to explicit calls into the clockevents state machine, remove the notification chain leftovers. Original-from: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/14018863.NQUzkFuafr@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03sched/idle: Use explicit broadcast oneshot control functionThomas Gleixner1-3/+2
Replace the clockevents_notify() call with an explicit function call. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/6422336.RMm7oUHcXh@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03clockevents: Provide explicit broadcast oneshot control functionsThomas Gleixner3-14/+20
clockevents_notify() is a leftover from the early design of the clockevents facility. It's really not a notification mechanism, it's a multiplex call. We are way better off to have explicit calls instead of this monstrosity. Split out the broadcast oneshot control into a separate function and provide inline helpers. Switch clockevents_notify() over. This will go away once all callers are converted. This also gets rid of the nested locking of clockevents_lock and broadcast_lock. The broadcast oneshot control functions do not require clockevents_lock. Only the managing functions (setup/shutdown/suspend/resume of the broadcast device require clockevents_lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Alexandre Courbot <gnurou@gmail.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Tony Lindgren <tony@atomide.com> Link: http://lkml.kernel.org/r/13000649.8qZuEDV0OA@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03clockevents: Remove the broadcast control leftoversThomas Gleixner1-10/+0
All users converted. Remove the notify leftovers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/2076318.76XJZ8QYP3@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03clockevents: Provide explicit broadcast control functionsThomas Gleixner3-38/+32
clockevents_notify() is a leftover from the early design of the clockevents facility. It's really not a notification mechanism, it's a multiplex call. We are way better off to have explicit calls instead of this monstrosity. Split out the broadcast control into a separate function and provide inline helpers. Switch clockevents_notify() over. This will go away once all callers are converted. This also gets rid of the nested locking of clockevents_lock and broadcast_lock. The broadcast control functions do not require clockevents_lock. Only the managing functions (setup/shutdown/suspend/resume of the broadcast device require clockevents_lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Lindgren <tony@atomide.com> Link: http://lkml.kernel.org/r/8086559.ttsuS0n1Xr@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03clocksource: Improve comment explaining clocks_calc_max_nsecs()'s 50% safety ↵John Stultz1-2/+5
margin Ingo noted that the description of clocks_calc_max_nsecs()'s 50% safety margin was somewhat circular. So this patch tries to improve the comment to better explain what we mean by the 50% safety margin and why we need it. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1427945681-29972-20-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03time, drivers/rtc: Don't bother with rtc_resume() for the nonstop clocksourceXunlei Pang1-17/+49
If a system does not provide a persistent_clock(), the time will be updated on resume by rtc_resume(). With the addition of the non-stop clocksources for suspend timing, those systems set the time on resume in timekeeping_resume(), but may not provide a valid persistent_clock(). This results in the rtc_resume() logic thinking no one has set the time and it then will over-write the suspend time again, which is not necessary and only increases clock error. So, fix this for rtc_resume(). This patch also improves the name of persistent_clock_exist to make it more grammatical. Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1427945681-29972-19-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03time: Fix a bug in timekeeping_suspend() with no persistent clockXunlei Pang1-17/+19
When there's no persistent clock, normally timekeeping_suspend_time should always be zero, but this can break in timekeeping_suspend(). At T1, there was a system suspend, so old_delta was assigned T1. After some time, one time adjustment happened, and xtime got the value of T1-dt(0s<dt<2s). Then, there comes another system suspend soon after this adjustment, obviously we will get a small negative delta_delta, resulting in a negative timekeeping_suspend_time. This is problematic, when doing timekeeping_resume() if there is no nonstop clocksource for example, it will hit the else leg and inject the improper sleeptime which is the wrong logic. So, we can solve this problem by only doing delta related code when the persistent clock is existent. Actually the code only makes sense for persistent clock cases. Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1427945681-29972-18-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03time: Don't build timekeeping_inject_sleeptime64() if no one uses itXunlei Pang1-0/+2
timekeeping_inject_sleeptime64() is only used by RTC suspend/resume, so add build dependencies on the necessary RTC related macros. Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> [ Improve commit message clarity. ] Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1427945681-29972-16-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03time: Add y2038 safe update_persistent_clock64()Xunlei Pang1-1/+12
As part of addressing in-kernel y2038 issues, this patch adds update_persistent_clock64() and replaces all the call sites of update_persistent_clock() with this function. This is a __weak implementation, which simply calls the existing y2038 unsafe update_persistent_clock(). This allows architecture specific implementations to be converted independently, and eventually y2038-unsafe update_persistent_clock() can be removed after all its architecture specific implementations have been converted to update_persistent_clock64(). Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1427945681-29972-4-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03time: Add y2038 safe read_persistent_clock64()Xunlei Pang1-10/+12
As part of addressing in-kernel y2038 issues, this patch adds read_persistent_clock64() and replaces all the call sites of read_persistent_clock() with this function. This is a __weak implementation, which simply calls the existing y2038 unsafe read_persistent_clock(). This allows architecture specific implementations to be converted independently, and eventually the y2038 unsafe read_persistent_clock() can be removed after all its architecture specific implementations have been converted to read_persistent_clock64(). Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1427945681-29972-3-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03time: Add y2038 safe read_boot_clock64()Xunlei Pang1-2/+9
As part of addressing in-kernel y2038 issues, this patch adds read_boot_clock64() and replaces all the call sites of read_boot_clock() with this function. This is a __weak implementation, which simply calls the existing y2038 unsafe read_boot_clock(). This allows architecture specific implementations to be converted independently, and eventually the y2038 unsafe read_boot_clock() can be removed after all its architecture specific implementations have been converted to read_boot_clock64(). Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1427945681-29972-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02ftrace/x86: Let dynamic trampolines call ops->func even for dynamic fopsSteven Rostedt (Red Hat)1-8/+14
Dynamically allocated trampolines call ftrace_ops_get_func to get the function which they should call. For dynamic fops (FTRACE_OPS_FL_DYNAMIC flag is set) ftrace_ops_list_func is always returned. This is reasonable for static trampolines but goes against the main advantage of dynamic ones, that is avoidance of going through the list of all registered callbacks for functions that are only being traced by a single callback. We can fix it by returning ops->func (or recursion safe version) from ftrace_ops_get_func whenever it is possible for dynamic trampolines. Note that dynamic trampolines are not allowed for dynamic fops if CONFIG_PREEMPT=y. Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1501291023000.25445@pobox.suse.cz Link: http://lkml.kernel.org/r/1424357773-13536-1-git-send-email-mbenes@suse.cz Reported-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-02timer: Further simplify the SMP and HOTPLUG logicPeter Zijlstra1-8/+15
Remove one CONFIG_HOTPLUG_CPU #ifdef in trade for introducing one CONFIG_SMP #ifdef. The CONFIG_SMP ifdef avoids declaring the per-CPU __tvec_bases storage on UP systems since they already have boot_tvec_bases. Also (re)add a runtime check on the base alignment -- for the paranoid amongst us :-) Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/fdd2d35e169bdc554ffa3fe77f77716298c75ada.1427814611.git.viresh.kumar@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>