summaryrefslogtreecommitdiffstats
path: root/Documentation
AgeCommit message (Collapse)AuthorFilesLines
2011-07-24KVM: MMU: remove bypass_guest_pfXiao Guangrong1-4/+0
The idea is from Avi: | Maybe it's time to kill off bypass_guest_pf=1. It's not as effective as | it used to be, since unsync pages always use shadow_trap_nonpresent_pte, | and since we convert between the two nonpresent_ptes during sync and unsync. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM guest: KVM Steal time registrationGlauber Costa1-0/+4
This patch implements the kvm bits of the steal time infrastructure. The most important part of it, is the steal time clock. It is an continuous clock that shows the accumulated amount of steal time since vcpu creation. It is supposed to survive cpu offlining/onlining. [marcelo: fix build with CONFIG_KVM_GUEST=n] Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Avi Kivity <avi@redhat.com> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: KVM Steal time guest/host interfaceGlauber Costa1-0/+34
To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM. This is per-vcpu, and using the kvmclock structure for that is an abuse we decided not to make. In this patchset, I am introducing a new msr, KVM_MSR_STEAL_TIME, that holds the memory area address containing information about steal time This patch contains the headers for it. I am keeping it separate to facilitate backports to people who wants to backport the kernel part but not the hypervisor, or the other way around. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guestsPaul Mackerras1-0/+32
This adds infrastructure which will be needed to allow book3s_hv KVM to run on older POWER processors, including PPC970, which don't support the Virtual Real Mode Area (VRMA) facility, but only the Real Mode Offset (RMO) facility. These processors require a physically contiguous, aligned area of memory for each guest. When the guest does an access in real mode (MMU off), the address is compared against a limit value, and if it is lower, the address is ORed with an offset value (from the Real Mode Offset Register (RMOR)) and the result becomes the real address for the access. The size of the RMA has to be one of a set of supported values, which usually includes 64MB, 128MB, 256MB and some larger powers of 2. Since we are unlikely to be able to allocate 64MB or more of physically contiguous memory after the kernel has been running for a while, we allocate a pool of RMAs at boot time using the bootmem allocator. The size and number of the RMAs can be set using the kvm_rma_size=xx and kvm_rma_count=xx kernel command line options. KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability of the pool of preallocated RMAs. The capability value is 1 if the processor can use an RMA but doesn't require one (because it supports the VRMA facility), or 2 if the processor requires an RMA for each guest. This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the pool and returns a file descriptor which can be used to map the RMA. It also returns the size of the RMA in the argument structure. Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION ioctl calls from userspace. To cope with this, we now preallocate the kvm->arch.ram_pginfo array when the VM is created with a size sufficient for up to 64GB of guest memory. Subsequently we will get rid of this array and use memory associated with each memslot instead. This moves most of the code that translates the user addresses into host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level to kvmppc_core_prepare_memory_region. Also, instead of having to look up the VMA for each page in order to check the page size, we now check that the pages we get are compound pages of 16MB. However, if we are adding memory that is mapped to an RMA, we don't bother with calling get_user_pages_fast and instead just offset from the base pfn for the RMA. Typically the RMA gets added after vcpus are created, which makes it inconvenient to have the LPCR (logical partition control register) value in the vcpu->arch struct, since the LPCR controls whether the processor uses RMA or VRMA for the guest. This moves the LPCR value into the kvm->arch struct and arranges for the MER (mediated external request) bit, which is the only bit that varies between vcpus, to be set in assembly code when going into the guest if there is a pending external interrupt request. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12KVM: PPC: Allow book3s_hv guests to use SMT processor modesPaul Mackerras1-0/+13
This lifts the restriction that book3s_hv guests can only run one hardware thread per core, and allows them to use up to 4 threads per core on POWER7. The host still has to run single-threaded. This capability is advertised to qemu through a new KVM_CAP_PPC_SMT capability. The return value of the ioctl querying this capability is the number of vcpus per virtual CPU core (vcore), currently 4. To use this, the host kernel should be booted with all threads active, and then all the secondary threads should be offlined. This will put the secondary threads into nap mode. KVM will then wake them from nap mode and use them for running guest code (while they are still offline). To wake the secondary threads, we send them an IPI using a new xics_wake_cpu() function, implemented in arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage we assume that the platform has a XICS interrupt controller and we are using icp-native.c to drive it. Since the woken thread will need to acknowledge and clear the IPI, we also export the base physical address of the XICS registers using kvmppc_set_xics_phys() for use in the low-level KVM book3s code. When a vcpu is created, it is assigned to a virtual CPU core. The vcore number is obtained by dividing the vcpu number by the number of threads per core in the host. This number is exported to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes to run the guest in single-threaded mode, it should make all vcpu numbers be multiples of the number of threads per core. We distinguish three states of a vcpu: runnable (i.e., ready to execute the guest), blocked (that is, idle), and busy in host. We currently implement a policy that the vcore can run only when all its threads are runnable or blocked. This way, if a vcpu needs to execute elsewhere in the kernel or in qemu, it can do so without being starved of CPU by the other vcpus. When a vcore starts to run, it executes in the context of one of the vcpu threads. The other vcpu threads all go to sleep and stay asleep until something happens requiring the vcpu thread to return to qemu, or to wake up to run the vcore (this can happen when another vcpu thread goes from busy in host state to blocked). It can happen that a vcpu goes from blocked to runnable state (e.g. because of an interrupt), and the vcore it belongs to is already running. In that case it can start to run immediately as long as the none of the vcpus in the vcore have started to exit the guest. We send the next free thread in the vcore an IPI to get it to start to execute the guest. It synchronizes with the other threads via the vcore->entry_exit_count field to make sure that it doesn't go into the guest if the other vcpus are exiting by the time that it is ready to actually enter the guest. Note that there is no fixed relationship between the hardware thread number and the vcpu number. Hardware threads are assigned to vcpus as they become runnable, so we will always use the lower-numbered hardware threads in preference to higher-numbered threads if not all the vcpus in the vcore are runnable, regardless of which vcpus are runnable. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12KVM: PPC: Accelerate H_PUT_TCE by implementing it in real modeDavid Gibson1-0/+35
This improves I/O performance for guests using the PAPR paravirtualization interface by making the H_PUT_TCE hcall faster, by implementing it in real mode. H_PUT_TCE is used for updating virtual IOMMU tables, and is used both for virtual I/O and for real I/O in the PAPR interface. Since this moves the IOMMU tables into the kernel, we define a new KVM_CREATE_SPAPR_TCE ioctl to allow qemu to create the tables. The ioctl returns a file descriptor which can be used to mmap the newly created table. The qemu driver models use them in the same way as userspace managed tables, but they can be updated directly by the guest with a real-mode H_PUT_TCE implementation, reducing the number of host/guest context switches during guest IO. There are certain circumstances where it is useful for userland qemu to write to the TCE table even if the kernel H_PUT_TCE path is used most of the time. Specifically, allowing this will avoid awkwardness when we need to reset the table. More importantly, we will in the future need to write the table in order to restore its state after a checkpoint resume or migration. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12KVM: PPC: Add support for Book3S processors in hypervisor modePaul Mackerras1-0/+17
This adds support for KVM running on 64-bit Book 3S processors, specifically POWER7, in hypervisor mode. Using hypervisor mode means that the guest can use the processor's supervisor mode. That means that the guest can execute privileged instructions and access privileged registers itself without trapping to the host. This gives excellent performance, but does mean that KVM cannot emulate a processor architecture other than the one that the hardware implements. This code assumes that the guest is running paravirtualized using the PAPR (Power Architecture Platform Requirements) interface, which is the interface that IBM's PowerVM hypervisor uses. That means that existing Linux distributions that run on IBM pSeries machines will also run under KVM without modification. In order to communicate the PAPR hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code to include/linux/kvm.h. Currently the choice between book3s_hv support and book3s_pr support (i.e. the existing code, which runs the guest in user mode) has to be made at kernel configuration time, so a given kernel binary can only do one or the other. This new book3s_hv code doesn't support MMIO emulation at present. Since we are running paravirtualized guests, this isn't a serious restriction. With the guest running in supervisor mode, most exceptions go straight to the guest. We will never get data or instruction storage or segment interrupts, alignment interrupts, decrementer interrupts, program interrupts, single-step interrupts, etc., coming to the hypervisor from the guest. Therefore this introduces a new KVMTEST_NONHV macro for the exception entry path so that we don't have to do the KVM test on entry to those exception handlers. We do however get hypervisor decrementer, hypervisor data storage, hypervisor instruction storage, and hypervisor emulation assist interrupts, so we have to handle those. In hypervisor mode, real-mode accesses can access all of RAM, not just a limited amount. Therefore we put all the guest state in the vcpu.arch and use the shadow_vcpu in the PACA only for temporary scratch space. We allocate the vcpu with kzalloc rather than vzalloc, and we don't use anything in the kvmppc_vcpu_book3s struct, so we don't allocate it. We don't have a shared page with the guest, but we still need a kvm_vcpu_arch_shared struct to store the values of various registers, so we include one in the vcpu_arch struct. The POWER7 processor has a restriction that all threads in a core have to be in the same partition. MMU-on kernel code counts as a partition (partition 0), so we have to do a partition switch on every entry to and exit from the guest. At present we require the host and guest to run in single-thread mode because of this hardware restriction. This code allocates a hashed page table for the guest and initializes it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We require that the guest memory is allocated using 16MB huge pages, in order to simplify the low-level memory management. This also means that we can get away without tracking paging activity in the host for now, since huge pages can't be paged or swapped. This also adds a few new exports needed by the book3s_hv code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12KVM: PPC: e500: enable magic pageScott Wood1-3/+5
This is a shared page used for paravirtualization. It is always present in the guest kernel's effective address space at the address indicated by the hypercall that enables it. The physical address specified by the hypercall is not used, as e500 does not have real mode. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12KVM: MMU: Adjust shadow paging to work when SMEP=1 and CR0.WP=0Avi Kivity1-0/+18
When CR0.WP=0, we sometimes map user pages as kernel pages (to allow the kernel to write to them). Unfortunately this also allows the kernel to fetch from these pages, even if CR4.SMEP is set. Adjust for this by also setting NX on the spte in these circumstances. Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Fix KVM_ASSIGN_SET_MSIX_ENTRY documentationJan Kiszka1-2/+4
The documented behavior did not match the implemented one (which also never changed). Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Clarify KVM_ASSIGN_PCI_DEVICE documentationJan Kiszka1-6/+1
Neither host_irq nor the guest_msi struct are used anymore today. Tag the former, drop the latter to avoid confusion. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Fixup documentation section numberingJan Kiszka1-1/+1
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: Document KVM_IOEVENTFDSasha Levin1-0/+30
Document KVM_IOEVENTFD that can be used to receive notifications of PIO/MMIO events without triggering an exit. Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: nVMX: DocumentationNadav Har'El1-0/+251
This patch includes a brief introduction to the nested vmx feature in the Documentation/kvm directory. The document also includes a copy of the vmcs12 structure, as requested by Avi Kivity. [marcelo: move to Documentation/virtual/kvm] Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: Document KVM_GET_LAPIC, KVM_SET_LAPIC ioctlAvi Kivity1-0/+32
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-06Documentation: fix cgroup blkio throttle filenamesAndrea Righi1-6/+6
All the blkio.throttle.* file names are incorrectly reported without ".throttle" in the documentation. Fix it. Signed-off-by: Andrea Righi <andrea@betterlinux.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-06Documentation: update CodingStyle memory allocatorsJesper Juhl1-2/+2
The list of available general purpose memory allocators in Documentation/CodingStyle chapter 14 is incomplete. This patch adds the missing vzalloc() to the list. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-03Merge branch 'pm-fixes' of ↵Linus Torvalds1-5/+21
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / Runtime: Update documentation regarding driver removal PM: Documentation: fix typo: pm_runtime_idle_sync() doesn't exist.
2011-07-03Merge branch 'hwmon-for-linus' of ↵Linus Torvalds2-2/+10
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: hwmon: (k10temp) Update documentation for Fam12h hwmon-vid: Fix typo in VIA CPU name hwmon: (f71882fg) Add support for the F71869A hwmon: Use <> rather than () around my e-mail address hwmon: (emc6w201) Properly handle all errors
2011-07-03hwmon: (k10temp) Update documentation for Fam12hClemens Ladisch1-2/+6
Add some CPU series IDs and links to the Fam12h datasheets. Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2011-07-03hwmon: (f71882fg) Add support for the F71869AHans de Goede1-0/+4
The F71869A is almost the same as the F71869F/E, except that it has the normal number of temp and pwm zones for a F71882FG derived chip, rather then the limited number of the F71869F/E. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Max Baldwin <archerseven@gmail.com> Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2011-07-02PM / Runtime: Update documentation regarding driver removalRafael J. Wysocki1-5/+21
Commit e1866b33b1e89f077b7132daae3dfd9a594e9a1a (PM / Runtime: Rework runtime PM handling during driver removal) forgot to update the documentation in Documentation/power/runtime_pm.txt to match the new code in drivers/base/dd.c. Update that documentation to match the code it describes. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Kevin Hilman <khilman@ti.com>
2011-07-02PM: Documentation: fix typo: pm_runtime_idle_sync() doesn't exist.Kevin Hilman1-1/+1
Replace reference to pm_runtime_idle_sync() in the driver core with pm_runtime_put_sync() which is used in the code. Signed-off-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-06-28Merge branch 'usb-linus' of ↵Linus Torvalds1-1/+8
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6 * 'usb-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: MAINTAINERS: add myself as maintainer of USB/IP usb: r8a66597-hcd: fix cannot detect low/full speed device USB: ehci-ath79: fix a NULL pointer dereference USB: Add new FT232H chip to drivers/usb/serial/ftdi_sio.c usb/isp1760: Fix bug preventing the unlinking of control urbs USB: Fix up URB error codes to reflect implementation. xhci: Always set urb->status to zero for isoc endpoints. xhci: Add reset on resume quirk for asrock p67 host xHCI 1.0: Incompatible Device Error USB: don't let errors prevent system sleep USB: don't let the hub driver prevent system sleep USB: change maintainership of ohci-hcd and ehci-hcd xHCI 1.0: Force Stopped Event(FSE) xhci: Don't warn about zeroed bMaxBurst descriptor field. USB: Free bandwidth when usb_disable_device is called. xhci: Reject double add of active endpoints. USB: TI 3410/5052 USB Serial Driver: Fix mem leak when firmware is too big. usb: musb: gadget: clear TXPKTRDY flag when set FLUSHFIFO usb: musb: host: compare status for negative error values
2011-06-21PM / Domains: Update documentationRafael J. Wysocki1-27/+14
Commit 4d27e9dcff00a6425d779b065ec8892e4f391661 (PM: Make power domain callbacks take precedence over subsystem ones) forgot to update the device power management documentation to take changes made by it into account. Correct that mistake. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-06-21PM: Update documentation regarding sysdevsRafael J. Wysocki1-26/+0
The part of Documentation/power/devices.txt regarding sysdevs is not valid any more after commit 2e711c04dbbf7a7732a3f7073b1fc285d12b369d (PM: Remove sysdev suspend, resume and shutdown operations), so remove it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-06-21PM / Runtime: Update doc: usage count no longer incremented across system PMKevin Hilman1-5/+0
Commit e8665002477f0278f84f898145b1f141ba26ee26 (PM: Allow pm_runtime_suspend() to succeed during system suspend) removed usage count increment across system PM. Update doc to reflect this. Signed-off-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-06-19Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: Move RCU_BOOST #ifdefs to header file rcu: use softirq instead of kthreads except when RCU_BOOST=y rcu: Use softirq to address performance regression rcu: Simplify curing of load woes
2011-06-17USB: Fix up URB error codes to reflect implementation.Sarah Sharp1-1/+8
Documentation/usb/error-codes.txt mentions that urb->status can be set to -EXDEV, if the isochronous transfer was not fully completed. However, in practice, EHCI, UHCI, and OHCI all only set -EXDEV in the individual frame status, never in the URB status. Those host controller actually always pass in a zero status to usb_hcd_giveback_urb, and rely on the core to set the appropriate status value. The xHCI driver ran into issues with the uvcvideo driver when it tried to set -EXDEV in urb->status, because the driver refused to submit URBs, and the userspace camera application's video froze. Clean up the documentation to reflect the actual implementation. Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu>
2011-06-15Documentation: fix cgroup typos and formattingJörg Sommer3-13/+13
Fix format and spelling. Signed-off-by: Jörg Sommer <joerg@alea.gnuu.de> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15Documentation: update cgroupfs mount pointJörg Sommer11-94/+109
According to commit 676db4af0430 ("cgroupfs: create /sys/fs/cgroup to mount cgroupfs on") the canonical mountpoint for the cgroup filesystem is /sys/fs/cgroup. Hence, this should be used in the documentation. Signed-off-by: Jörg Sommer <joerg@alea.gnuu.de> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15Documentation: update kmemleak supported archsMaxin B. John1-1/+3
Instead of listing the architectures that are supported by kmemleak in Documentation/kmemleak.txt, just refer people to the list of supported architecutures in lib/Kconfig.debug so that Documentation/kmemleak.txt does not need more updates for this. Signed-off-by: Maxin B. John <maxin.john@gmail.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15Documentation: update printk-formats.txtAndrew Murray1-2/+117
This patch updates the incomplete documentation concerning the printk extended format specifiers. Signed-off-by: Andrew Murray <amurray@mpc-data.co.uk> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15Documentation/feature-removal-schedule.txt: remove ns_cgroup from ↵akpm@linux-foundation.org1-17/+0
feature-removal-schedule.txt Commit a77aea92010acf ("cgroup: remove the ns_cgroup") removed the ns_cgroup but it forgot to remove the related doc in feature-removal-schedule.txt. Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Daniel Lezcano <daniel.lezcano@free.fr> Cc: Serge E. Hallyn <serge.hallyn@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15memcg: add documentation for the memory.numastat APIYing Han1-0/+19
[akpm@linux-foundation.org: rework text, fit it into 80-cols] Signed-off-by: Ying Han <yinghan@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15backlight: new driver for the ADP8870 backlight devicesMichael Hennerich1-0/+56
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-14rcu: Use softirq to address performance regressionShaohua Li1-0/+1
Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread) introduced performance regression. In an AIM7 test, this commit degraded performance by about 40%. The commit runs rcu callbacks in a kthread instead of softirq. We observed high rate of context switch which is caused by this. Out test system has 64 CPUs and HZ is 1000, so we saw more than 64k context switch per second which is caused by RCU's per-CPU kthread. A trace showed that most of the time the RCU per-CPU kthread doesn't actually handle any callbacks, but instead just does a very small amount of work handling grace periods. This means that RCU's per-CPU kthreads are making the scheduler do quite a bit of work in order to allow a very small amount of RCU-related processing to be done. Alex Shi's analysis determined that this slowdown is due to lock contention within the scheduler. Unfortunately, as Peter Zijlstra points out, the scheduler's real-time semantics require global action, which means that this contention is inherent in real-time scheduling. (Yes, perhaps someone will come up with a workaround -- otherwise, -rt is not going to do well on large SMP systems -- but this patch will work around this issue in the meantime. And "the meantime" might well be forever.) This patch therefore re-introduces softirq processing to RCU, but only for core RCU work. RCU callbacks are still executed in kthread context, so that only a small amount of RCU work runs in softirq context in the common case. This should minimize ksoftirqd execution, allowing us to skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Tested-by: "Alex,Shi" <alex.shi@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-06-14Merge branch 'for-linus' of git://neil.brown.name/mdLinus Torvalds1-1/+1
* 'for-linus' of git://neil.brown.name/md: md/raid5: remove unusual use of bio_iovec_idx() md/raid5: fix FUA request handling in ops_run_io() md/raid5: fix raid5_set_bi_hw_segments md:Documentation/md.txt - fix typo md/bitmap: remove unused fields from struct bitmap md/bitmap: use proper accessor macro md: check ->hot_remove_disk when removing disk md: Using poll /proc/mdstat can monitor the events of adding a spare disks MD: use is_power_of_2 macro MD: raid5 do not set fullsync MD: support initial bitmap creation in-kernel MD: add sync_super to mddev_t struct MD: raid1 changes to allow use by device mapper MD: move thread wakeups into resume MD: possible typo MD: no sync IO while suspended MD: no integrity register if no gendisk
2011-06-09md:Documentation/md.txt - fix typoNeilBrown1-1/+1
Reported-by: CoolCold <coolthecold@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-07usb-storage: redo incorrect readsAlan Stern1-0/+2
Some USB mass-storage devices have bugs that cause them not to handle the first READ(10) command they receive correctly. The Corsair Padlock v2 returns completely bogus data for its first read (possibly it returns the data in encrypted form even though the device is supposed to be unlocked). The Feiya SD/SDHC card reader fails to complete the first READ(10) command after it is plugged in or after a new card is inserted, returning a status code that indicates it thinks the command was invalid, which prevents the kernel from retrying the read. Since the first read of a new device or a new medium is for the partition sector, the kernel is unable to retrieve the device's partition table. Users have to manually issue an "hdparm -z" or "blockdev --rereadpt" command before they can access the device. This patch (as1470) works around the problem. It adds a new quirk flag, US_FL_INVALID_READ10, indicating that the first READ(10) should always be retried immediately, as should any failing READ(10) commands (provided the preceding READ(10) command succeeded, to avoid getting stuck in a loop). The patch also adds appropriate unusual_devs entries containing the new flag. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Sven Geggus <sven-usbst@geggus.net> Tested-by: Paul Hartman <paul.hartman+linux@gmail.com> CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net> CC: <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-02Merge git://git.infradead.org/iommu-2.6Linus Torvalds1-1/+4
* git://git.infradead.org/iommu-2.6: intel-iommu: Fix off-by-one in RMRR setup intel-iommu: Add domain check in domain_remove_one_dev_info intel-iommu: Remove Host Bridge devices from identity mapping intel-iommu: Use coherent DMA mask when requested intel-iommu: Dont cache iova above 32bit intel-iommu: Speed up processing of the identity_mapping function intel-iommu: Check for identity mapping candidate using system dma mask intel-iommu: Only unlink device domains from iommu intel-iommu: Enable super page (2MiB, 1GiB, etc.) support intel-iommu: Flush unmaps at domain_exit intel-iommu: Remove obsolete comment from detect_intel_iommu intel-iommu: fix VT-d PMR disable for TXT on S3 resume
2011-06-01intel-iommu: Enable super page (2MiB, 1GiB, etc.) supportYouquan Song1-1/+4
There are no externally-visible changes with this. In the loop in the internal __domain_mapping() function, we simply detect if we are mapping: - size >= 2MiB, and - virtual address aligned to 2MiB, and - physical address aligned to 2MiB, and - on hardware that supports superpages. (and likewise for larger superpages). We automatically use a superpage for such mappings. We never have to worry about *breaking* superpages, since we trust that we will always *unmap* the same range that was mapped. So all we need to do is ensure that dma_pte_clear_range() will also cope with superpages. Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so it can return a PTE at the appropriate level rather than always extending the page tables all the way down to level 1. Again, this is simplified by the fact that we should never encounter existing small pages when we're creating a mapping; any old mapping that used the same virtual range will have been entirely removed and its obsolete page tables freed. Provide an 'intel_iommu=sp_off' argument on the command line as a chicken bit. Not that it should ever be required. == The original commit seen in the iommu-2.6.git was Youquan's implementation (and completion) of my own half-baked code which I'd typed into an email. Followed by half a dozen subsequent 'fixes'. I've taken the unusual step of rewriting history and collapsing the original commits in order to keep the main history simpler, and make life easier for the people who are going to have to backport this to older kernels. And also so I can give it a more coherent commit comment which (hopefully) gives a better explanation of what's going on. The original sequence of commits leading to identical code was: Youquan Song (3): intel-iommu: super page support intel-iommu: Fix superpage alignment calculation error intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte() David Woodhouse (4): intel-iommu: Precalculate superpage support for dmar_domain intel-iommu: Fix hardware_largepage_caps() intel-iommu: Fix inappropriate use of superpages in __domain_mapping() intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-05-30lguest: remove support for VIRTIO_F_NOTIFY_ON_EMPTY.Rusty Russell1-19/+1
No virtio device does this any more, so no need to clutter lguest with it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-30lguest: fix up compilation after moveRusty Russell2-2/+2
ed16648eb5b86917f0b90bdcdbc857202da72f90 "Move kvm, uml, and lguest subdirectories" broke the lguest example launcher. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-29Merge branch 'for_linus' of ↵Linus Torvalds1-184/+0
git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86: (43 commits) acer-wmi: support integer return type from WMI methods msi-laptop: fix section mismatch in reference from the function load_scm_model_init acer-wmi: support to set communication device state by new wmid method acer-wmi: allow 64-bits return buffer from WMI methods acer-wmi: check the existence of internal 3G device when set capability platform/x86:delete two unused variables support wlan hotkey on Acer Travelmate 5735Z platform-x86: intel_mid_thermal: Fix memory leak platform/x86: Fix Makefile for intel_mid_powerbtn platform/x86: Simplify intel_mid_powerbtn acer-wmi: Delete out-of-date documentation acerhdf: Clean up includes acerhdf: Drop pointless dependency on THERMAL_HWMON acer-wmi: Update MAINTAINERS wmi: Orphan ACPI-WMI driver tc1100-wmi: Orphan driver acer-wmi: does not allow negative number set to initial device state platform/oaktrail: ACPI EC Extra driver for Oaktrail thinkpad_acpi: Convert printks to pr_<level> thinkpad_acpi: Correct !CONFIG_THINKPAD_ACPI_VIDEO warning ...
2011-05-29Merge branch 'x86-platform-next' into x86-platformMatthew Garrett1-184/+0
2011-05-29Merge branch 'release' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: ACPI EC: remove redundant code ACPI: Add D3 cold state ACPI: processor: fix processor_physically_present in UP kernel ACPI: Split out custom_method functionality into an own driver ACPI: Cleanup custom_method debug stuff ACPI EC: enable MSI workaround for Quanta laptops ACPICA: Update to version 20110413 ACPICA: Execute an orphan _REG method under the EC device ACPICA: Move ACPI_NUM_PREDEFINED_REGIONS to a more appropriate place ACPICA: Update internal address SpaceID for DataTable regions ACPICA: Add more methods eligible for NULL package element removal ACPICA: Split all internal Global Lock functions to new file - evglock ACPI: EC: add another DMI check for ASUS hardware ACPI EC: remove dead code ACPICA: Fix code divergence of global lock handling ACPICA: Use acpi_os_create_lock interface ACPI: osl, add acpi_os_create_lock interface ACPI:Fix goto flows in thermal-sys
2011-05-29Merge branch 'idle-release' of ↵Linus Torvalds1-0/+36
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param x86 idle: deprecate "no-hlt" cmdline param x86 idle APM: deprecate CONFIG_APM_CPU_IDLE x86 idle floppy: deprecate disable_hlt() x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it x86 idle: clarify AMD erratum 400 workaround idle governor: Avoid lock acquisition to read pm_qos before entering idle cpuidle: menu: fixed wrapping timers at 4.294 seconds
2011-05-29x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline paramLen Brown1-0/+7
mwait_idle() is a C1-only idle loop intended to be more efficient than HLT on SMP hardware that supports it. But mwait_idle() has been replaced by the more general mwait_idle_with_hints(), which handles both C1 and deeper C-states. ACPI uses only mwait_idle_with_hints(), and never uses mwait_idle(). Deprecate mwait_idle() and the "idle=mwait" cmdline param to simplify the x86 idle code. After this change, kernels configured with (!CONFIG_ACPI=n && !CONFIG_INTEL_IDLE=n) when run on hardware that support MWAIT will simply use HLT. If MWAIT is desired on those systems, cpuidle and the cpuidle drivers above can be used. cc: x86@kernel.org cc: stable@kernel.org # .39.x Signed-off-by: Len Brown <len.brown@intel.com>
2011-05-29x86 idle: deprecate "no-hlt" cmdline paramLen Brown1-0/+11
We'd rather that modern machines not check if HLT works on every entry into idle, for the benefit of machines that had marginal electricals 15-years ago. If those machines are still running the upstream kernel, they can use "idle=poll". The only difference will be that they'll now invoke HLT in machine_hlt(). cc: x86@kernel.org # .39.x Signed-off-by: Len Brown <len.brown@intel.com>