diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2019-11-30 10:33:14 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2019-11-30 10:33:14 -0800 |
commit | aa32f1169148beb90d71494e2f2a1999ba7b5366 (patch) | |
tree | ec8c434bff07bf0beb2df08629089824927f62f9 /Documentation | |
parent | d5bb349dbbe27537e90a03b9597deeb07723a86d (diff) | |
parent | 93f4e735b6d98ee4b7a1252d81e815a983e359f2 (diff) | |
download | linux-aa32f1169148beb90d71494e2f2a1999ba7b5366.tar.bz2 |
Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull hmm updates from Jason Gunthorpe:
"This is another round of bug fixing and cleanup. This time the focus
is on the driver pattern to use mmu notifiers to monitor a VA range.
This code is lifted out of many drivers and hmm_mirror directly into
the mmu_notifier core and written using the best ideas from all the
driver implementations.
This removes many bugs from the drivers and has a very pleasing
diffstat. More drivers can still be converted, but that is for another
cycle.
- A shared branch with RDMA reworking the RDMA ODP implementation
- New mmu_interval_notifier API. This is focused on the use case of
monitoring a VA and simplifies the process for drivers
- A common seq-count locking scheme built into the
mmu_interval_notifier API usable by drivers that call
get_user_pages() or hmm_range_fault() with the VA range
- Conversion of mlx5 ODP, hfi1, radeon, nouveau, AMD GPU, and Xen
GntDev drivers to the new API. This deletes a lot of wonky driver
code.
- Two improvements for hmm_range_fault(), from testing done by Ralph"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap
mm/hmm: make full use of walk_page_range()
xen/gntdev: use mmu_interval_notifier_insert
mm/hmm: remove hmm_mirror and related
drm/amdgpu: Use mmu_interval_notifier instead of hmm_mirror
drm/amdgpu: Use mmu_interval_insert instead of hmm_mirror
drm/amdgpu: Call find_vma under mmap_sem
nouveau: use mmu_interval_notifier instead of hmm_mirror
nouveau: use mmu_notifier directly for invalidate_range_start
drm/radeon: use mmu_interval_notifier_insert
RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv
RDMA/odp: Use mmu_interval_notifier_insert()
mm/hmm: define the pre-processor related parts of hmm.h even if disabled
mm/hmm: allow hmm_range to be used with a mmu_interval_notifier or hmm_mirror
mm/mmu_notifier: add an interval tree notifier
mm/mmu_notifier: define the header pre-processor parts even if disabled
mm/hmm: allow snapshot of the special zero page
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/vm/hmm.rst | 105 |
1 files changed, 24 insertions, 81 deletions
diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst index 0a5960beccf7..893a8ba0e9fe 100644 --- a/Documentation/vm/hmm.rst +++ b/Documentation/vm/hmm.rst @@ -147,49 +147,16 @@ Address space mirroring implementation and API Address space mirroring's main objective is to allow duplication of a range of CPU page table into a device page table; HMM helps keep both synchronized. A device driver that wants to mirror a process address space must start with the -registration of an hmm_mirror struct:: - - int hmm_mirror_register(struct hmm_mirror *mirror, - struct mm_struct *mm); - -The mirror struct has a set of callbacks that are used -to propagate CPU page tables:: - - struct hmm_mirror_ops { - /* release() - release hmm_mirror - * - * @mirror: pointer to struct hmm_mirror - * - * This is called when the mm_struct is being released. The callback - * must ensure that all access to any pages obtained from this mirror - * is halted before the callback returns. All future access should - * fault. - */ - void (*release)(struct hmm_mirror *mirror); - - /* sync_cpu_device_pagetables() - synchronize page tables - * - * @mirror: pointer to struct hmm_mirror - * @update: update information (see struct mmu_notifier_range) - * Return: -EAGAIN if update.blockable false and callback need to - * block, 0 otherwise. - * - * This callback ultimately originates from mmu_notifiers when the CPU - * page table is updated. The device driver must update its page table - * in response to this callback. The update argument tells what action - * to perform. - * - * The device driver must not return from this callback until the device - * page tables are completely updated (TLBs flushed, etc); this is a - * synchronous call. - */ - int (*sync_cpu_device_pagetables)(struct hmm_mirror *mirror, - const struct hmm_update *update); - }; - -The device driver must perform the update action to the range (mark range -read only, or fully unmap, etc.). The device must complete the update before -the driver callback returns. +registration of a mmu_interval_notifier:: + + mni->ops = &driver_ops; + int mmu_interval_notifier_insert(struct mmu_interval_notifier *mni, + unsigned long start, unsigned long length, + struct mm_struct *mm); + +During the driver_ops->invalidate() callback the device driver must perform +the update action to the range (mark range read only, or fully unmap, +etc.). The device must complete the update before the driver callback returns. When the device driver wants to populate a range of virtual addresses, it can use:: @@ -216,70 +183,46 @@ The usage pattern is:: struct hmm_range range; ... + range.notifier = &mni; range.start = ...; range.end = ...; range.pfns = ...; range.flags = ...; range.values = ...; range.pfn_shift = ...; - hmm_range_register(&range, mirror); - /* - * Just wait for range to be valid, safe to ignore return value as we - * will use the return value of hmm_range_fault() below under the - * mmap_sem to ascertain the validity of the range. - */ - hmm_range_wait_until_valid(&range, TIMEOUT_IN_MSEC); + if (!mmget_not_zero(mni->notifier.mm)) + return -EFAULT; again: + range.notifier_seq = mmu_interval_read_begin(&mni); down_read(&mm->mmap_sem); ret = hmm_range_fault(&range, HMM_RANGE_SNAPSHOT); if (ret) { up_read(&mm->mmap_sem); - if (ret == -EBUSY) { - /* - * No need to check hmm_range_wait_until_valid() return value - * on retry we will get proper error with hmm_range_fault() - */ - hmm_range_wait_until_valid(&range, TIMEOUT_IN_MSEC); - goto again; - } - hmm_range_unregister(&range); + if (ret == -EBUSY) + goto again; return ret; } + up_read(&mm->mmap_sem); + take_lock(driver->update); - if (!hmm_range_valid(&range)) { + if (mmu_interval_read_retry(&ni, range.notifier_seq) { release_lock(driver->update); - up_read(&mm->mmap_sem); goto again; } - // Use pfns array content to update device page table + /* Use pfns array content to update device page table, + * under the update lock */ - hmm_range_unregister(&range); release_lock(driver->update); - up_read(&mm->mmap_sem); return 0; } The driver->update lock is the same lock that the driver takes inside its -sync_cpu_device_pagetables() callback. That lock must be held before calling -hmm_range_valid() to avoid any race with a concurrent CPU page table update. - -HMM implements all this on top of the mmu_notifier API because we wanted a -simpler API and also to be able to perform optimizations latter on like doing -concurrent device updates in multi-devices scenario. - -HMM also serves as an impedance mismatch between how CPU page table updates -are done (by CPU write to the page table and TLB flushes) and how devices -update their own page table. Device updates are a multi-step process. First, -appropriate commands are written to a buffer, then this buffer is scheduled for -execution on the device. It is only once the device has executed commands in -the buffer that the update is done. Creating and scheduling the update command -buffer can happen concurrently for multiple devices. Waiting for each device to -report commands as executed is serialized (there is no point in doing this -concurrently). - +invalidate() callback. That lock must be held before calling +mmu_interval_read_retry() to avoid any race with a concurrent CPU page table +update. Leverage default_flags and pfn_flags_mask ========================================= |