fbdev: Don't sort deferred-I/O pages by default

Fbdev's deferred I/O sorts all dirty pages by default, which incurs a significant overhead. Make the sorting step optional and update the few drivers that require it. Use a FIFO list by default. Most fbdev drivers with deferred I/O build a bounding rectangle around the dirty pages or simply flush the whole screen. The only two affected DRM drivers, generic fbdev and vmwgfx, both use a bounding rectangle. In those cases, the exact order of the pages doesn't matter. The other drivers look at the page index or handle pages one-by-one. The patch sets the sort_pagelist flag for those, even though some of them would probably work correctly without sorting. Driver maintainers should update their driver accordingly. Sorting pages by memory offset for deferred I/O performs an implicit bubble-sort step on the list of dirty pages. The algorithm goes through the list of dirty pages and inserts each new page according to its index field. Even worse, list traversal always starts at the first entry. As video memory is most likely updated scanline by scanline, the algorithm traverses through the complete list for each updated page. For example, with 1024x768x32bpp each page covers exactly one scanline. Writing a single screen update from top to bottom requires updating 768 pages. With an average list length of 384 entries, a screen update creates (768 * 384 =) 294912 compare operation. Fix this by making the sorting step opt-in and update the few drivers that require it. All other drivers work with unsorted page lists. Pages are appended to the list. Therefore, in the common case of writing the framebuffer top to bottom, pages are still sorted by offset, which may have a positive effect on performance. Playing a video [1] in mplayer's benchmark mode shows the difference (i7-4790, FullHD, simpledrm, kernel with debugging). mplayer -benchmark -nosound -vo fbdev ./big_buck_bunny_720p_stereo.ogg With sorted page lists: BENCHMARKs: VC: 32.960s VO: 73.068s A: 0.000s Sys: 2.413s = 108.441s BENCHMARK%: VC: 30.3947% VO: 67.3802% A: 0.0000% Sys: 2.2251% = 100.0000% With unsorted page lists: BENCHMARKs: VC: 31.005s VO: 42.889s A: 0.000s Sys: 2.256s = 76.150s BENCHMARK%: VC: 40.7156% VO: 56.3219% A: 0.0000% Sys: 2.9625% = 100.0000% VC shows the overhead of video decoding, VO shows the overhead of the video output. Using unsorted page lists reduces the benchmark's run time by ~32s/~25%. v2: * Make sorted pagelists the special case (Sam) * Comment on drivers' use of pagelist (Sam) * Warn about the overhead in comment Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Sam Ravnborg <sam@ravnborg.org> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_stereo.ogg # [1] Link: https://patchwork.freedesktop.org/patch/msgid/20220211094640.21632-3-tzimmermann@suse.de
author: Thomas Zimmermann <tzimmermann@suse.de> 2022-02-11 10:46:40 +0100
committer: Thomas Zimmermann <tzimmermann@suse.de> 2022-02-16 16:41:45 +0100
commit: 8c30e2d81bfddc5ab9f6b04db1c0f7d6ca7bdf46 (patch)
tree: 9752a799b56b8ff3ccea139e9aab0484bbb6db4e /drivers/video/fbdev/core
parent: 105a940416fc622406653b6fe54732897642dfbc (diff)
download: linux-8c30e2d81bfddc5ab9f6b04db1c0f7d6ca7bdf46.tar.bz2
1 files changed, 17 insertions, 7 deletions
diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
index 169a81c8172b..98b0f23bf5e2 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -96,7 +96,7 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
 	struct page *page = vmf->page;
 	struct fb_info *info = vmf->vma->vm_private_data;
 	struct fb_deferred_io *fbdefio = info->fbdefio;
-	struct page *cur;
+	struct list_head *pos = &fbdefio->pagelist;
 
 	/* this is a callback we get when userspace first tries to
 	write to the page. we schedule a workqueue. that workqueue
@@ -137,14 +137,24 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
 	if (!list_empty(&page->lru))
 		goto page_already_added;
 
-	/* we loop through the pagelist before adding in order
-	to keep the pagelist sorted */
-	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
-		if (cur->index > page->index)
-			break;
+	if (unlikely(fbdefio->sort_pagelist)) {
+		/*
+		 * We loop through the pagelist before adding in order to
+		 * keep the pagelist sorted. This has significant overhead
+		 * of O(n^2) with n being the number of written pages. If
+		 * possible, drivers should try to work with unsorted page
+		 * lists instead.
+		 */
+		struct page *cur;
+
+		list_for_each_entry(cur, &fbdefio->pagelist, lru) {
+			if (cur->index > page->index)
+				break;
+		}
+		pos = &cur->lru;
 	}
 
-	list_add_tail(&page->lru, &cur->lru);
+	list_add_tail(&page->lru, pos);
 
 page_already_added:
 	mutex_unlock(&fbdefio->lock);
author	Thomas Zimmermann <tzimmermann@suse.de>	2022-02-11 10:46:40 +0100
committer	Thomas Zimmermann <tzimmermann@suse.de>	2022-02-16 16:41:45 +0100
commit	8c30e2d81bfddc5ab9f6b04db1c0f7d6ca7bdf46 (patch)
tree	9752a799b56b8ff3ccea139e9aab0484bbb6db4e /drivers/video/fbdev/core
parent	105a940416fc622406653b6fe54732897642dfbc (diff)
download	linux-8c30e2d81bfddc5ab9f6b04db1c0f7d6ca7bdf46.tar.bz2