From 7f1fc268c47491fd5e63548f6415fc8604e13003 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Sun, 5 May 2013 09:30:09 -0400 Subject: xen/vcpu/pvhvm: Fix vcpu hotplugging hanging. If a user did: echo 0 > /sys/devices/system/cpu/cpu1/online echo 1 > /sys/devices/system/cpu/cpu1/online we would (this a build with DEBUG enabled) get to: smpboot: ++++++++++++++++++++=_---CPU UP 1 .. snip.. smpboot: Stack at about ffff880074c0ff44 smpboot: CPU1: has booted. and hang. The RCU mechanism would kick in an try to IPI the CPU1 but the IPIs (and all other interrupts) would never arrive at the CPU1. At first glance at least. A bit digging in the hypervisor trace shows that (using xenanalyze): [vla] d4v1 vec 243 injecting 0.043163027 --|x d4v1 intr_window vec 243 src 5(vector) intr f3 ] 0.043163639 --|x d4v1 vmentry cycles 1468 ] 0.043164913 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254 0.043164913 --|x d4v1 inj_virq vec 243 real [vla] d4v1 vec 243 injecting 0.043164913 --|x d4v1 intr_window vec 243 src 5(vector) intr f3 ] 0.043165526 --|x d4v1 vmentry cycles 1472 ] 0.043166800 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254 0.043166800 --|x d4v1 inj_virq vec 243 real [vla] d4v1 vec 243 injecting there is a pending event (subsequent debugging shows it is the IPI from the VCPU0 when smpboot.c on VCPU1 has done "set_cpu_online(smp_processor_id(), true)") and the guest VCPU1 is interrupted with the callback IPI (0xf3 aka 243) which ends up calling __xen_evtchn_do_upcall. The __xen_evtchn_do_upcall seems to do *something* but not acknowledge the pending events. And the moment the guest does a 'cli' (that is the ffffffff81673254 in the log above) the hypervisor is invoked again to inject the IPI (0xf3) to tell the guest it has pending interrupts. This repeats itself forever. The culprit was the per_cpu(xen_vcpu, cpu) pointer. At the bootup we set each per_cpu(xen_vcpu, cpu) to point to the shared_info->vcpu_info[vcpu] but later on use the VCPUOP_register_vcpu_info to register per-CPU structures (xen_vcpu_setup). This is used to allow events for more than 32 VCPUs and for performance optimizations reasons. When the user performs the VCPU hotplug we end up calling the the xen_vcpu_setup once more. We make the hypercall which returns -EINVAL as it does not allow multiple registration calls (and already has re-assigned where the events are being set). We pick the fallback case and set per_cpu(xen_vcpu, cpu) to point to the shared_info->vcpu_info[vcpu] (which is a good fallback during bootup). However the hypervisor is still setting events in the register per-cpu structure (per_cpu(xen_vcpu_info, cpu)). As such when the events are set by the hypervisor (such as timer one), and when we iterate in __xen_evtchn_do_upcall we end up reading stale events from the shared_info->vcpu_info[vcpu] instead of the per_cpu(xen_vcpu_info, cpu) structures. Hence we never acknowledge the events that the hypervisor has set and the hypervisor keeps on reminding us to ack the events which we never do. The fix is simple. Don't on the second time when xen_vcpu_setup is called over-write the per_cpu(xen_vcpu, cpu) if it points to per_cpu(xen_vcpu_info). Acked-by: Stefano Stabellini CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk --- arch/x86/xen/enlighten.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'arch') diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index ddbd54a9b845..94a81f41e8a2 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -157,6 +157,21 @@ static void xen_vcpu_setup(int cpu) BUG_ON(HYPERVISOR_shared_info == &xen_dummy_shared_info); + /* + * This path is called twice on PVHVM - first during bootup via + * smp_init -> xen_hvm_cpu_notify, and then if the VCPU is being + * hotplugged: cpu_up -> xen_hvm_cpu_notify. + * As we can only do the VCPUOP_register_vcpu_info once lets + * not over-write its result. + * + * For PV it is called during restore (xen_vcpu_restore) and bootup + * (xen_setup_vcpu_info_placement). The hotplug mechanism does not + * use this function. + */ + if (xen_hvm_domain()) { + if (per_cpu(xen_vcpu, cpu) == &per_cpu(xen_vcpu_info, cpu)) + return; + } if (cpu < MAX_VIRT_CPUS) per_cpu(xen_vcpu,cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu]; -- cgit v1.2.3 From a520996ae2e2792e1f90b74e67c974120c8a3b83 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Sun, 5 May 2013 08:51:47 -0400 Subject: xen/vcpu: Document the xen_vcpu_info and xen_vcpu They are important structures and it is not clear at first look what they are for. The xen_vcpu is a pointer. By default it points to the shared_info structure (at the CPU offset location). However if the VCPUOP_register_vcpu_info hypercall is implemented we can make the xen_vcpu pointer point to a per-CPU location. Acked-by: Stefano Stabellini [v1: Added comments from Ian Campbell] Signed-off-by: Konrad Rzeszutek Wilk --- arch/x86/xen/enlighten.c | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 94a81f41e8a2..a2babdb13a26 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -85,7 +85,29 @@ EXPORT_SYMBOL_GPL(hypercall_page); +/* + * Pointer to the xen_vcpu_info structure or + * &HYPERVISOR_shared_info->vcpu_info[cpu]. See xen_hvm_init_shared_info + * and xen_vcpu_setup for details. By default it points to share_info->vcpu_info + * but if the hypervisor supports VCPUOP_register_vcpu_info then it can point + * to xen_vcpu_info. The pointer is used in __xen_evtchn_do_upcall to + * acknowledge pending events. + * Also more subtly it is used by the patched version of irq enable/disable + * e.g. xen_irq_enable_direct and xen_iret in PV mode. + * + * The desire to be able to do those mask/unmask operations as a single + * instruction by using the per-cpu offset held in %gs is the real reason + * vcpu info is in a per-cpu pointer and the original reason for this + * hypercall. + * + */ DEFINE_PER_CPU(struct vcpu_info *, xen_vcpu); + +/* + * Per CPU pages used if hypervisor supports VCPUOP_register_vcpu_info + * hypercall. This can be used both in PV and PVHVM mode. The structure + * overrides the default per_cpu(xen_vcpu, cpu) value. + */ DEFINE_PER_CPU(struct vcpu_info, xen_vcpu_info); enum xen_domain_type xen_domain_type = XEN_NATIVE; @@ -187,7 +209,12 @@ static void xen_vcpu_setup(int cpu) /* Check to see if the hypervisor will put the vcpu_info structure where we want it, which allows direct access via - a percpu-variable. */ + a percpu-variable. + N.B. This hypercall can _only_ be called once per CPU. Subsequent + calls will error out with -EINVAL. This is due to the fact that + hypervisor has no unregister variant and this hypercall does not + allow to over-write info.mfn and info.offset. + */ err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, cpu, &info); if (err) { -- cgit v1.2.3 From d5b17dbff83d63fb6bf35daec21c8ebfb8d695b5 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Sun, 5 May 2013 09:24:07 -0400 Subject: xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info As it will point to some data, but not event channel data (the shared_info has an array limited to 32). This means that for PVHVM guests with more than 32 VCPUs without the usage of VCPUOP_register_info any interrupts to VCPUs larger than 32 would have gone unnoticed during early bootup. That is OK, as during early bootup, in smp_init we end up calling the hotplug mechanism (xen_hvm_cpu_notify) which makes the VCPUOP_register_vcpu_info call for all VCPUs and we can receive interrupts on VCPUs 33 and further. This is just a cleanup. Acked-by: Stefano Stabellini Signed-off-by: Konrad Rzeszutek Wilk --- arch/x86/xen/enlighten.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'arch') diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index a2babdb13a26..31e19f97fb5a 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1646,6 +1646,9 @@ void __ref xen_hvm_init_shared_info(void) * online but xen_hvm_init_shared_info is run at resume time too and * in that case multiple vcpus might be online. */ for_each_online_cpu(cpu) { + /* Leave it to be NULL. */ + if (cpu >= MAX_VIRT_CPUS) + continue; per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu]; } } -- cgit v1.2.3 From cb91f8f44caf346654fe9b17b2b359b95f98c049 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Mon, 6 May 2013 08:33:15 -0400 Subject: xen/spinlock: Fix check from greater than to be also be greater or equal to. During review of git commit cb9c6f15f318aa3aeb62fe525aa5c6dcf6eee159 ("xen/spinlock: Check against default value of -1 for IRQ line.") Stefano pointed out a bug in the patch. Unfortunatly due to vacation timing the fix was not applied and this patch fixes it up. Acked-by: Stefano Stabellini Signed-off-by: Konrad Rzeszutek Wilk --- arch/x86/xen/spinlock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 8b54603ce816..3002ec1bb71a 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -364,7 +364,7 @@ void __cpuinit xen_init_lock_cpu(int cpu) int irq; const char *name; - WARN(per_cpu(lock_kicker_irq, cpu) > 0, "spinlock on CPU%d exists on IRQ%d!\n", + WARN(per_cpu(lock_kicker_irq, cpu) >= 0, "spinlock on CPU%d exists on IRQ%d!\n", cpu, per_cpu(lock_kicker_irq, cpu)); /* -- cgit v1.2.3 From 4ea9b9aca90cfc71e6872ed3522356755162932c Mon Sep 17 00:00:00 2001 From: Zhenzhong Duan Date: Fri, 29 Mar 2013 09:14:00 +0800 Subject: xen: mask x2APIC feature in PV On x2apic enabled pvm, doing sysrq+l, got NULL pointer dereference as below. SysRq : Show backtrace of all active CPUs BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] memcpy+0xb/0x120 Call Trace: [] ? __x2apic_send_IPI_mask+0x73/0x160 [] x2apic_send_IPI_all+0x1e/0x20 [] arch_trigger_all_cpu_backtrace+0x6c/0xb0 [] ? _raw_spin_lock_irqsave+0x34/0x50 [] sysrq_handle_showallcpus+0xe/0x10 [] __handle_sysrq+0x7d/0x140 [] ? __handle_sysrq+0x140/0x140 [] write_sysrq_trigger+0x57/0x60 [] proc_reg_write+0x86/0xc0 [] vfs_write+0xce/0x190 [] sys_write+0x55/0x90 [] system_call_fastpath+0x16/0x1b That's because apic points to apic_x2apic_cluster or apic_x2apic_phys but the basic element like cpumask isn't initialized. Mask x2APIC feature in pvm to avoid overwrite of apic pointer, update commit message per Konrad's suggestion. Signed-off-by: Zhenzhong Duan Tested-by: Tamon Shiose Signed-off-by: Konrad Rzeszutek Wilk --- arch/x86/xen/enlighten.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'arch') diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 31e19f97fb5a..4bd0066a1ef0 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -429,6 +429,9 @@ static void __init xen_init_cpuid_mask(void) cpuid_leaf1_edx_mask &= ~((1 << X86_FEATURE_APIC) | /* disable local APIC */ (1 << X86_FEATURE_ACPI)); /* disable ACPI */ + + cpuid_leaf1_ecx_mask &= ~(1 << (X86_FEATURE_X2APIC % 32)); + ax = 1; cx = 0; xen_cpuid(&ax, &bx, &cx, &dx); -- cgit v1.2.3 From 4be6bfe2af829efa6da91ca33f9f2fcb3e37bc89 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Mon, 22 Apr 2013 17:12:21 -0600 Subject: xen/pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI_MSIX_FLAGS_BIRMASK is mis-named because the BIR mask is in the Table Offset register, not the flags ("Message Control" per spec) register. Acked-by: Jan Beulich Signed-off-by: Bjorn Helgaas CC: Konrad Rzeszutek Wilk Signed-off-by: Konrad Rzeszutek Wilk --- arch/x86/pci/xen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c index 4a9be6ddf054..82c62d689da8 100644 --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -299,7 +299,7 @@ static int xen_initdom_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) pci_read_config_dword(dev, pos + PCI_MSIX_TABLE, &table_offset); - bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK); + bir = (u8)(table_offset & PCI_MSIX_TABLE_BIR); map_irq.table_base = pci_resource_start(dev, bir); map_irq.entry_nr = msidesc->msi_attrib.entry_nr; -- cgit v1.2.3 From 7c86617dde0015112de566a4619a9b06871580c1 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Mon, 22 Apr 2013 17:12:28 -0600 Subject: xen/pci: Used cached MSI-X capability offset We now cache the MSI-X capability offset in the struct pci_dev, so no need to find the capability again. Acked-by: Jan Beulich Signed-off-by: Bjorn Helgaas CC: Konrad Rzeszutek Wilk Signed-off-by: Konrad Rzeszutek Wilk --- arch/x86/pci/xen.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c index 82c62d689da8..48e8461057ba 100644 --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -295,8 +295,7 @@ static int xen_initdom_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) int pos; u32 table_offset, bir; - pos = pci_find_capability(dev, PCI_CAP_ID_MSIX); - + pos = dev->msix_cap; pci_read_config_dword(dev, pos + PCI_MSIX_TABLE, &table_offset); bir = (u8)(table_offset & PCI_MSIX_TABLE_BIR); -- cgit v1.2.3