summaryrefslogtreecommitdiffstats
path: root/drivers/block
AgeCommit message (Collapse)AuthorFilesLines
2016-05-20zram: introduce per-device debug_stat sysfs nodeSergey Senozhatsky2-0/+22
debug_stat sysfs is read-only and represents various debugging data that zram developers may need. This file is not meant to be used by anyone else: its content is not documented and will change any time w/o any notice. Therefore, the output of debug_stat file contains a version string. To avoid any confusion, we will increase the version number every time we modify the output. At the moment this file exports only one value -- the number of re-compressions, IOW, the number of times compression fast path has failed. This stat is temporary any will be useful in case if any per-cpu compression streams regressions will be reported. Link: http://lkml.kernel.org/r/20160513230834.GB26763@bbox Link: http://lkml.kernel.org/r/20160511134553.12655-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20zram: remove max_comp_streams internalsSergey Senozhatsky3-40/+11
Remove the internal part of max_comp_streams interface, since we switched to per-cpu streams. We will keep RW max_comp_streams attr around, because: a) we may (silently) switch back to idle compression streams list and don't want to disturb user space b) max_comp_streams attr must wait for the next 'lay off cycle'; we give user space 2 years to adjust before we remove/downgrade the attr, and there are already several attrs scheduled for removal in 4.11, so it's too late for max_comp_streams. This slightly change a user visible behaviour: - First, reading from max_comp_stream file now will always return the number of online CPUs. - Second, writing to max_comp_stream will not take any effect. Link: http://lkml.kernel.org/r/20160503165546.25201-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20zram: user per-cpu compression streamsSergey Senozhatsky3-231/+116
Remove idle streams list and keep compression streams in per-cpu data. This removes two contented spin_lock()/spin_unlock() calls from write path and also prevent write OP from being preempted while holding the compression stream, which can cause slow downs. For instance, let's assume that we have N cpus and N-2 max_comp_streams.TASK1 owns the last idle stream, TASK2-TASK3 come in with the write requests: TASK1 TASK2 TASK3 zram_bvec_write() spin_lock find stream spin_unlock compress <<preempted>> zram_bvec_write() spin_lock find stream spin_unlock no_stream schedule zram_bvec_write() spin_lock find_stream spin_unlock no_stream schedule spin_lock release stream spin_unlock wake up TASK2 not only TASK2 and TASK3 will not get the stream, TASK1 will be preempted in the middle of its operation; while we would prefer it to finish compression and release the stream. Test environment: x86_64, 4 CPU box, 3G zram, lzo The following fio tests were executed: read, randread, write, randwrite, rw, randrw with the increasing number of jobs from 1 to 10. 4 streams 8 streams per-cpu =========================================================== jobs1 READ: 2520.1MB/s 2566.5MB/s 2491.5MB/s READ: 2102.7MB/s 2104.2MB/s 2091.3MB/s WRITE: 1355.1MB/s 1320.2MB/s 1378.9MB/s WRITE: 1103.5MB/s 1097.2MB/s 1122.5MB/s READ: 434013KB/s 435153KB/s 439961KB/s WRITE: 433969KB/s 435109KB/s 439917KB/s READ: 403166KB/s 405139KB/s 403373KB/s WRITE: 403223KB/s 405197KB/s 403430KB/s jobs2 READ: 7958.6MB/s 8105.6MB/s 8073.7MB/s READ: 6864.9MB/s 6989.8MB/s 7021.8MB/s WRITE: 2438.1MB/s 2346.9MB/s 3400.2MB/s WRITE: 1994.2MB/s 1990.3MB/s 2941.2MB/s READ: 981504KB/s 973906KB/s 1018.8MB/s WRITE: 981659KB/s 974060KB/s 1018.1MB/s READ: 937021KB/s 938976KB/s 987250KB/s WRITE: 934878KB/s 936830KB/s 984993KB/s jobs3 READ: 13280MB/s 13553MB/s 13553MB/s READ: 11534MB/s 11785MB/s 11755MB/s WRITE: 3456.9MB/s 3469.9MB/s 4810.3MB/s WRITE: 3029.6MB/s 3031.6MB/s 4264.8MB/s READ: 1363.8MB/s 1362.6MB/s 1448.9MB/s WRITE: 1361.9MB/s 1360.7MB/s 1446.9MB/s READ: 1309.4MB/s 1310.6MB/s 1397.5MB/s WRITE: 1307.4MB/s 1308.5MB/s 1395.3MB/s jobs4 READ: 20244MB/s 20177MB/s 20344MB/s READ: 17886MB/s 17913MB/s 17835MB/s WRITE: 4071.6MB/s 4046.1MB/s 6370.2MB/s WRITE: 3608.9MB/s 3576.3MB/s 5785.4MB/s READ: 1824.3MB/s 1821.6MB/s 1997.5MB/s WRITE: 1819.8MB/s 1817.4MB/s 1992.5MB/s READ: 1765.7MB/s 1768.3MB/s 1937.3MB/s WRITE: 1767.5MB/s 1769.1MB/s 1939.2MB/s jobs5 READ: 18663MB/s 18986MB/s 18823MB/s READ: 16659MB/s 16605MB/s 16954MB/s WRITE: 3912.4MB/s 3888.7MB/s 6126.9MB/s WRITE: 3506.4MB/s 3442.5MB/s 5519.3MB/s READ: 1798.2MB/s 1746.5MB/s 1935.8MB/s WRITE: 1792.7MB/s 1740.7MB/s 1929.1MB/s READ: 1727.6MB/s 1658.2MB/s 1917.3MB/s WRITE: 1726.5MB/s 1657.2MB/s 1916.6MB/s jobs6 READ: 21017MB/s 20922MB/s 21162MB/s READ: 19022MB/s 19140MB/s 18770MB/s WRITE: 3968.2MB/s 4037.7MB/s 6620.8MB/s WRITE: 3643.5MB/s 3590.2MB/s 6027.5MB/s READ: 1871.8MB/s 1880.5MB/s 2049.9MB/s WRITE: 1867.8MB/s 1877.2MB/s 2046.2MB/s READ: 1755.8MB/s 1710.3MB/s 1964.7MB/s WRITE: 1750.5MB/s 1705.9MB/s 1958.8MB/s jobs7 READ: 21103MB/s 20677MB/s 21482MB/s READ: 18522MB/s 18379MB/s 19443MB/s WRITE: 4022.5MB/s 4067.4MB/s 6755.9MB/s WRITE: 3691.7MB/s 3695.5MB/s 5925.6MB/s READ: 1841.5MB/s 1933.9MB/s 2090.5MB/s WRITE: 1842.7MB/s 1935.3MB/s 2091.9MB/s READ: 1832.4MB/s 1856.4MB/s 1971.5MB/s WRITE: 1822.3MB/s 1846.2MB/s 1960.6MB/s jobs8 READ: 20463MB/s 20194MB/s 20862MB/s READ: 18178MB/s 17978MB/s 18299MB/s WRITE: 4085.9MB/s 4060.2MB/s 7023.8MB/s WRITE: 3776.3MB/s 3737.9MB/s 6278.2MB/s READ: 1957.6MB/s 1944.4MB/s 2109.5MB/s WRITE: 1959.2MB/s 1946.2MB/s 2111.4MB/s READ: 1900.6MB/s 1885.7MB/s 2082.1MB/s WRITE: 1896.2MB/s 1881.4MB/s 2078.3MB/s jobs9 READ: 19692MB/s 19734MB/s 19334MB/s READ: 17678MB/s 18249MB/s 17666MB/s WRITE: 4004.7MB/s 4064.8MB/s 6990.7MB/s WRITE: 3724.7MB/s 3772.1MB/s 6193.6MB/s READ: 1953.7MB/s 1967.3MB/s 2105.6MB/s WRITE: 1953.4MB/s 1966.7MB/s 2104.1MB/s READ: 1860.4MB/s 1897.4MB/s 2068.5MB/s WRITE: 1858.9MB/s 1895.9MB/s 2066.8MB/s jobs10 READ: 19730MB/s 19579MB/s 19492MB/s READ: 18028MB/s 18018MB/s 18221MB/s WRITE: 4027.3MB/s 4090.6MB/s 7020.1MB/s WRITE: 3810.5MB/s 3846.8MB/s 6426.8MB/s READ: 1956.1MB/s 1994.6MB/s 2145.2MB/s WRITE: 1955.9MB/s 1993.5MB/s 2144.8MB/s READ: 1852.8MB/s 1911.6MB/s 2075.8MB/s WRITE: 1855.7MB/s 1914.6MB/s 2078.1MB/s perf stat 4 streams 8 streams per-cpu ==================================================================================================================== jobs1 stalled-cycles-frontend 23,174,811,209 ( 38.21%) 23,220,254,188 ( 38.25%) 23,061,406,918 ( 38.34%) stalled-cycles-backend 11,514,174,638 ( 18.98%) 11,696,722,657 ( 19.27%) 11,370,852,810 ( 18.90%) instructions 73,925,005,782 ( 1.22) 73,903,177,632 ( 1.22) 73,507,201,037 ( 1.22) branches 14,455,124,835 ( 756.063) 14,455,184,779 ( 755.281) 14,378,599,509 ( 758.546) branch-misses 69,801,336 ( 0.48%) 80,225,529 ( 0.55%) 72,044,726 ( 0.50%) jobs2 stalled-cycles-frontend 49,912,741,782 ( 46.11%) 50,101,189,290 ( 45.95%) 32,874,195,633 ( 35.11%) stalled-cycles-backend 27,080,366,230 ( 25.02%) 27,949,970,232 ( 25.63%) 16,461,222,706 ( 17.58%) instructions 122,831,629,690 ( 1.13) 122,919,846,419 ( 1.13) 121,924,786,775 ( 1.30) branches 23,725,889,239 ( 692.663) 23,733,547,140 ( 688.062) 23,553,950,311 ( 794.794) branch-misses 90,733,041 ( 0.38%) 96,320,895 ( 0.41%) 84,561,092 ( 0.36%) jobs3 stalled-cycles-frontend 66,437,834,608 ( 45.58%) 63,534,923,344 ( 43.69%) 42,101,478,505 ( 33.19%) stalled-cycles-backend 34,940,799,661 ( 23.97%) 34,774,043,148 ( 23.91%) 21,163,324,388 ( 16.68%) instructions 171,692,121,862 ( 1.18) 171,775,373,044 ( 1.18) 170,353,542,261 ( 1.34) branches 32,968,962,622 ( 628.723) 32,987,739,894 ( 630.512) 32,729,463,918 ( 717.027) branch-misses 111,522,732 ( 0.34%) 110,472,894 ( 0.33%) 99,791,291 ( 0.30%) jobs4 stalled-cycles-frontend 98,741,701,675 ( 49.72%) 94,797,349,965 ( 47.59%) 54,535,655,381 ( 33.53%) stalled-cycles-backend 54,642,609,615 ( 27.51%) 55,233,554,408 ( 27.73%) 27,882,323,541 ( 17.14%) instructions 220,884,807,851 ( 1.11) 220,930,887,273 ( 1.11) 218,926,845,851 ( 1.35) branches 42,354,518,180 ( 592.105) 42,362,770,587 ( 590.452) 41,955,552,870 ( 716.154) branch-misses 138,093,449 ( 0.33%) 131,295,286 ( 0.31%) 121,794,771 ( 0.29%) jobs5 stalled-cycles-frontend 116,219,747,212 ( 48.14%) 110,310,397,012 ( 46.29%) 66,373,082,723 ( 33.70%) stalled-cycles-backend 66,325,434,776 ( 27.48%) 64,157,087,914 ( 26.92%) 32,999,097,299 ( 16.76%) instructions 270,615,008,466 ( 1.12) 270,546,409,525 ( 1.14) 268,439,910,948 ( 1.36) branches 51,834,046,557 ( 599.108) 51,811,867,722 ( 608.883) 51,412,576,077 ( 729.213) branch-misses 158,197,086 ( 0.31%) 142,639,805 ( 0.28%) 133,425,455 ( 0.26%) jobs6 stalled-cycles-frontend 138,009,414,492 ( 48.23%) 139,063,571,254 ( 48.80%) 75,278,568,278 ( 32.80%) stalled-cycles-backend 79,211,949,650 ( 27.68%) 79,077,241,028 ( 27.75%) 37,735,797,899 ( 16.44%) instructions 319,763,993,731 ( 1.12) 319,937,782,834 ( 1.12) 316,663,600,784 ( 1.38) branches 61,219,433,294 ( 595.056) 61,250,355,540 ( 598.215) 60,523,446,617 ( 733.706) branch-misses 169,257,123 ( 0.28%) 154,898,028 ( 0.25%) 141,180,587 ( 0.23%) jobs7 stalled-cycles-frontend 162,974,812,119 ( 49.20%) 159,290,061,987 ( 48.43%) 88,046,641,169 ( 33.21%) stalled-cycles-backend 92,223,151,661 ( 27.84%) 91,667,904,406 ( 27.87%) 44,068,454,971 ( 16.62%) instructions 369,516,432,430 ( 1.12) 369,361,799,063 ( 1.12) 365,290,380,661 ( 1.38) branches 70,795,673,950 ( 594.220) 70,743,136,124 ( 597.876) 69,803,996,038 ( 732.822) branch-misses 181,708,327 ( 0.26%) 165,767,821 ( 0.23%) 150,109,797 ( 0.22%) jobs8 stalled-cycles-frontend 185,000,017,027 ( 49.30%) 182,334,345,473 ( 48.37%) 99,980,147,041 ( 33.26%) stalled-cycles-backend 105,753,516,186 ( 28.18%) 107,937,830,322 ( 28.63%) 51,404,177,181 ( 17.10%) instructions 418,153,161,055 ( 1.11) 418,308,565,828 ( 1.11) 413,653,475,581 ( 1.38) branches 80,035,882,398 ( 592.296) 80,063,204,510 ( 589.843) 79,024,105,589 ( 730.530) branch-misses 199,764,528 ( 0.25%) 177,936,926 ( 0.22%) 160,525,449 ( 0.20%) jobs9 stalled-cycles-frontend 210,941,799,094 ( 49.63%) 204,714,679,254 ( 48.55%) 114,251,113,756 ( 33.96%) stalled-cycles-backend 122,640,849,067 ( 28.85%) 122,188,553,256 ( 28.98%) 58,360,041,127 ( 17.35%) instructions 468,151,025,415 ( 1.10) 467,354,869,323 ( 1.11) 462,665,165,216 ( 1.38) branches 89,657,067,510 ( 585.628) 89,411,550,407 ( 588.990) 88,360,523,943 ( 730.151) branch-misses 218,292,301 ( 0.24%) 191,701,247 ( 0.21%) 178,535,678 ( 0.20%) jobs10 stalled-cycles-frontend 233,595,958,008 ( 49.81%) 227,540,615,689 ( 49.11%) 160,341,979,938 ( 43.07%) stalled-cycles-backend 136,153,676,021 ( 29.03%) 133,635,240,742 ( 28.84%) 65,909,135,465 ( 17.70%) instructions 517,001,168,497 ( 1.10) 516,210,976,158 ( 1.11) 511,374,038,613 ( 1.37) branches 98,911,641,329 ( 585.796) 98,700,069,712 ( 591.583) 97,646,761,028 ( 728.712) branch-misses 232,341,823 ( 0.23%) 199,256,308 ( 0.20%) 183,135,268 ( 0.19%) per-cpu streams tend to cause significantly less stalled cycles; execute less branches and hit less branch-misses. perf stat reported execution time 4 streams 8 streams per-cpu ==================================================================== jobs1 seconds elapsed 20.909073870 20.875670495 20.817838540 jobs2 seconds elapsed 18.529488399 18.720566469 16.356103108 jobs3 seconds elapsed 18.991159531 18.991340812 16.766216066 jobs4 seconds elapsed 19.560643828 19.551323547 16.246621715 jobs5 seconds elapsed 24.746498464 25.221646740 20.696112444 jobs6 seconds elapsed 28.258181828 28.289765505 22.885688857 jobs7 seconds elapsed 32.632490241 31.909125381 26.272753738 jobs8 seconds elapsed 35.651403851 36.027596308 29.108024711 jobs9 seconds elapsed 40.569362365 40.024227989 32.898204012 jobs10 seconds elapsed 44.673112304 43.874898137 35.632952191 Please see Link: http://marc.info/?l=linux-kernel&m=146166970727530 Link: http://marc.info/?l=linux-kernel&m=146174716719650 for more test results (under low memory conditions). Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Suggested-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20zsmalloc: require GFP in zs_malloc()Sergey Senozhatsky1-2/+2
Pass GFP flags to zs_malloc() instead of using a fixed mask supplied to zs_create_pool(), so we can be more flexible, but, more importantly, we need this to switch zram to per-cpu compression streams -- zram will try to allocate handle with preemption disabled in a fast path and switch to a slow path (using different gfp mask) if the fast one has failed. Apart from that, this also align zs_malloc() interface with zspool/zbud. [sergey.senozhatsky@gmail.com: pass GFP flags to zs_malloc() instead of using a fixed mask] Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19mm: rename _count, field of the struct page, to _refcountJoonsoo Kim1-1/+1
Many developers already know that field for reference count of the struct page is _count and atomic type. They would try to handle it directly and this could break the purpose of page reference count tracepoint. To prevent direct _count modification, this patch rename it to _refcount and add warning message on the code. After that, developer who need to handle reference count will find that field should not be accessed directly. [akpm@linux-foundation.org: fix comments, per Vlastimil] [akpm@linux-foundation.org: Documentation/vm/transhuge.txt too] [sfr@canb.auug.org.au: sync ethernet driver changes] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Sunil Goutham <sgoutham@cavium.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Manish Chopra <manish.chopra@qlogic.com> Cc: Yuval Mintz <yuval.mintz@qlogic.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-12/+16
Pull networking updates from David Miller: "Highlights: 1) Support SPI based w5100 devices, from Akinobu Mita. 2) Partial Segmentation Offload, from Alexander Duyck. 3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE. 4) Allow cls_flower stats offload, from Amir Vadai. 5) Implement bpf blinding, from Daniel Borkmann. 6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is actually using FASYNC these atomics are superfluous. From Eric Dumazet. 7) Run TCP more preemptibly, also from Eric Dumazet. 8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e driver, from Gal Pressman. 9) Allow creating ppp devices via rtnetlink, from Guillaume Nault. 10) Improve BPF usage documentation, from Jesper Dangaard Brouer. 11) Support tunneling offloads in qed, from Manish Chopra. 12) aRFS offloading in mlx5e, from Maor Gottlieb. 13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo Leitner. 14) Add MSG_EOR support to TCP, this allows controlling packet coalescing on application record boundaries for more accurate socket timestamp sampling. From Martin KaFai Lau. 15) Fix alignment of 64-bit netlink attributes across the board, from Nicolas Dichtel. 16) Per-vlan stats in bridging, from Nikolay Aleksandrov. 17) Several conversions of drivers to ethtool ksettings, from Philippe Reynes. 18) Checksum neutral ILA in ipv6, from Tom Herbert. 19) Factorize all of the various marvell dsa drivers into one, from Vivien Didelot 20) Add VF support to qed driver, from Yuval Mintz" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits) Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m" Revert "phy dp83867: Make rgmii parameters optional" r8169: default to 64-bit DMA on recent PCIe chips phy dp83867: Make rgmii parameters optional phy dp83867: Fix compilation with CONFIG_OF_MDIO=m bpf: arm64: remove callee-save registers use for tmp registers asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions switchdev: pass pointer to fib_info instead of copy net_sched: close another race condition in tcf_mirred_release() tipc: fix nametable publication field in nl compat drivers: net: Don't print unpopulated net_device name qed: add support for dcbx. ravb: Add missing free_irq() calls to ravb_close() qed: Remove a stray tab net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings net: ethernet: fec-mpc52xx: use phydev from struct net_device bpf, doc: fix typo on bpf_asm descriptions stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings net: ethernet: fs-enet: use phydev from struct net_device ...
2016-05-17Merge branch 'for-4.7/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds10-81/+15
Pull block driver updates from Jens Axboe: "On top of the core pull request, this is the drivers pull request for this merge window. This contains: - Switch drivers to the new write back cache API, and kill off the flush flags. From me. - Kill the discard support for the STEC pci-e flash driver. It's trivially broken, and apparently unmaintained, so it's safer to just remove it. From Jeff Moyer. - A set of lightnvm updates from the usual suspects (Matias/Javier, and Simon), and fixes from Arnd, Jeff Mahoney, Sagi, and Wenwei Tao. - A set of updates for NVMe: - Turn the controller state management into a proper state machine. From Christoph. - Shuffling of code in preparation for NVMe-over-fabrics, also from Christoph. - Cleanup of the command prep part from Ming Lin. - Rewrite of the discard support from Ming Lin. - Deadlock fix for namespace removal from Ming Lin. - Use the now exported blk-mq tag helper for IO termination. From Sagi. - Various little fixes from Christoph, Guilherme, Keith, Ming Lin, Wang Sheng-Hui. - Convert mtip32xx to use the now exported blk-mq tag iter function, from Keith" * 'for-4.7/drivers' of git://git.kernel.dk/linux-block: (74 commits) lightnvm: reserved space calculation incorrect lightnvm: rename nr_pages to nr_ppas on nvm_rq lightnvm: add is_cached entry to struct ppa_addr lightnvm: expose gennvm_mark_blk to targets lightnvm: remove mgt targets on mgt removal lightnvm: pass dma address to hardware rather than pointer lightnvm: do not assume sequential lun alloc. nvme/lightnvm: Log using the ctrl named device lightnvm: rename dma helper functions lightnvm: enable metadata to be sent to device lightnvm: do not free unused metadata on rrpc lightnvm: fix out of bound ppa lun id on bb tbl lightnvm: refactor set_bb_tbl for accepting ppa list lightnvm: move responsibility for bad blk mgmt to target lightnvm: make nvm_set_rqd_ppalist() aware of vblks lightnvm: remove struct factory_blks lightnvm: refactor device ops->get_bb_tbl() lightnvm: introduce nvm_for_each_lun_ppa() macro lightnvm: refactor dev->online_target to global nvm_targets lightnvm: rename nvm_targets to nvm_tgt_type ...
2016-05-17Merge branch 'for-4.7/core' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull core block layer updates from Jens Axboe: "This is the core block IO changes for this merge window. Nothing earth shattering in here, it's mostly just fixes. In detail: - Fix for a long standing issue where wrong ordering in blk-mq caused order_to_size() to spew a warning. From Bart. - Async discard support from Christoph. Basically just splitting our sync interface into a submit + wait part. - Add a cleaner interface for flagging whether a device has a write back cache or not. We've previously overloaded blk_queue_flush() with this, but let's make it more explicit. Drivers cleaned up and updated in the drivers pull request. From me. - Fix for a double check for whether IO accounting is enabled or not. From Michael Callahan. - Fix for the async discard from Mike Snitzer, reinstating the early EOPNOTSUPP return if the device doesn't support discards. - Also from Mike, export bio_inc_remaining() so dm can drop it's private copy of it. - From Ming Lin, add support for passing in an offset for request payloads. - Tag function export from Sagi, which will be used in NVMe in the drivers pull. - Two blktrace related fixes from Shaohua. - Propagate NOMERGE flag when making a request from a bio, also from Shaohua. - An optimization to not parse cgroup paths in blk-throttle, if we don't need to. From Shaohua" * 'for-4.7/core' of git://git.kernel.dk/linux-block: blk-mq: fix undefined behaviour in order_to_size() blk-throttle: don't parse cgroup path if trace isn't enabled blktrace: add missed mask name blktrace: delete garbage for message trace block: make bio_inc_remaining() interface accessible again block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard block: Minor blk_account_io_start usage cleanup block: add __blkdev_issue_discard block: remove struct bio_batch block: copy NOMERGE flag from bio to request block: add ability to flag write back caching on a device blk-mq: Export tagset iter function block: add offset in blk_add_request_payload() writeback: Fix performance regression in wb_over_bg_thresh()
2016-05-10block/drbd: align properly u64 in nl messagesNicolas Dichtel1-12/+16
The attribute 0 is never used in drbd, so let's use it as pad attribute in netlink messages. This minimizes the patch. Note that this patch is only compile-tested. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28rbd: report unsupported features to syslogIlya Dryomov1-3/+6
... instead of just returning an error. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-28rbd: fix rbd map vs notify racesIlya Dryomov1-24/+19
A while ago, commit 9875201e1049 ("rbd: fix use-after free of rbd_dev->disk") fixed rbd unmap vs notify race by introducing an exported wrapper for flushing notifies and sticking it into do_rbd_remove(). A similar problem exists on the rbd map path, though: the watch is registered in rbd_dev_image_probe(), while the disk is set up quite a few steps later, in rbd_dev_device_setup(). Nothing prevents a notify from coming in and crashing on a NULL rbd_dev->disk: BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 Call Trace: [<ffffffffa0508344>] rbd_watch_cb+0x34/0x180 [rbd] [<ffffffffa04bd290>] do_event_work+0x40/0xb0 [libceph] [<ffffffff8109d5db>] process_one_work+0x17b/0x470 [<ffffffff8109e3ab>] worker_thread+0x11b/0x400 [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400 [<ffffffff810a5acf>] kthread+0xcf/0xe0 [<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 [<ffffffff81645dd8>] ret_from_fork+0x58/0x90 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 RIP [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd] If an error occurs during rbd map, we have to error out, potentially tearing down a watch. Just like on rbd unmap, notifies have to be flushed, otherwise rbd_watch_cb() may end up trying to read in the image header after rbd_dev_image_release() has run: Assertion failure in rbd_dev_header_info() at line 4722: rbd_assert(rbd_image_format_valid(rbd_dev->image_format)); Call Trace: [<ffffffff81cccee0>] ? rbd_parent_request_create+0x150/0x150 [<ffffffff81cd4e59>] rbd_dev_refresh+0x59/0x390 [<ffffffff81cd5229>] rbd_watch_cb+0x69/0x290 [<ffffffff81fde9bf>] do_event_work+0x10f/0x1c0 [<ffffffff81107799>] process_one_work+0x689/0x1a80 [<ffffffff811076f7>] ? process_one_work+0x5e7/0x1a80 [<ffffffff81132065>] ? finish_task_switch+0x225/0x640 [<ffffffff81107110>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0 [<ffffffff81108c69>] worker_thread+0xd9/0x1320 [<ffffffff81108b90>] ? process_one_work+0x1a80/0x1a80 [<ffffffff8111b02d>] kthread+0x21d/0x2e0 [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550 [<ffffffff82022802>] ret_from_fork+0x22/0x40 [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550 RIP [<ffffffff81ccd8f9>] rbd_dev_header_info+0xa19/0x1e30 To fix this, a) check if RBD_DEV_FLAG_EXISTS is set before calling revalidate_disk(), b) move ceph_osdc_flush_notifies() call into rbd_dev_header_unwatch_sync() to cover rbd map error paths and c) turn header read-in into a critical section. The latter also happens to take care of rbd map foo@bar vs rbd snap rm foo@bar race. Fixes: http://tracker.ceph.com/issues/15490 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-25skd: remove broken discard supportJeff Moyer1-58/+1
Simply creating a file system on an skd device, followed by mount and fstrim will result in errors in the logs and then a BUG(). Let's remove discard support from that driver. As far as I can tell, it hasn't worked right since it was merged. This patch also has a side-effect of cleaning up an unintentional shadowed declaration inside of skd_end_request. I tested to ensure that I can still do I/O to the device using xfstests ./check -g quick. I didn't do anything more extensive than that, though. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-15Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+6
Pull block fixes from Jens Axboe: "A few fixes for the current series. This contains: - Two fixes for NVMe: One fixes a reset race that can be triggered by repeated insert/removal of the module. The other fixes an issue on some platforms, where we get probe timeouts since legacy interrupts isn't working. This used not to be a problem since we had the worker thread poll for completions, but since that was killed off, it means those poor souls can't successfully probe their NVMe device. Use a proper IRQ check and probe (msi-x -> msi ->legacy), like most other drivers to work around this. Both from Keith. - A loop corruption issue with offset in iters, from Ming Lei. - A fix for not having the partition stat per cpu ref count initialized before sending out the KOBJ_ADD, which could cause user space to access the counter prior to initialization. Also from Ming Lei. - A fix for using the wrong congestion state, from Kaixu Xia" * 'for-linus' of git://git.kernel.dk/linux-block: block: loop: fix filesystem corruption in case of aio/dio NVMe: Always use MSI/MSI-x interrupts NVMe: Fix reset/remove race writeback: fix the wrong congested state variable definition block: partition: initialize percpuref before sending out KOBJ_ADD
2016-04-15block: loop: fix filesystem corruption in case of aio/dioMing Lei1-0/+6
Starting from commit e36f620428(block: split bios to max possible length), block core starts to split bio in the middle of bvec. Unfortunately loop dio/aio doesn't consider this situation, and always treat 'iter.iov_offset' as zero. Then filesystem corruption is observed. This patch figures out the offset of the base bvevc via 'bio->bi_iter.bi_bvec_done' and fixes the issue by passing the offset to iov iterator. Fixes: e36f6204288088f (block: split bios to max possible length) Cc: Keith Busch <keith.busch@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org (4.5) Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-13block: kill off q->flush_flagsJens Axboe1-1/+1
Now that we converted everything to the newer block write cache interface, kill off the queue flush_flags and queueable flush entries. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-12xen-blkfront: switch to using blk_queue_write_cache()Jens Axboe1-1/+2
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-12virtio_blk: switch to using blk_queue_write_cache()Jens Axboe1-5/+1
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-12ps3disk: switch to using blk_queue_write_cache()Jens Axboe1-1/+1
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-12skd_main: switch to using blk_queue_write_cache()Jens Axboe1-1/+1
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-12osdblk: switch to using blk_queue_write_cache()Jens Axboe1-1/+1
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-12nbd: switch to using blk_queue_write_cache()Jens Axboe1-2/+2
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-12mtip32xx: remove call to blk_queue_flush()Jens Axboe1-6/+0
The driver calls it with 0 for flags, since it doesn't have a writeback cache. Just remove the call, as it's a no-op right now. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-12loop: switch to using blk_queue_write_cache()Jens Axboe1-1/+1
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-12drbd: switch to using blk_queue_write_cache()Jens Axboe1-1/+1
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-12mtip32xx: Convert to use blk_mq_tagset_busy_iterKeith Busch1-3/+3
Only a single tags array anyway. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-12block: add offset in blk_add_request_payload()Ming Lin1-1/+1
We could kmalloc() the payload, so need the offset in page. Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-07Merge branch 'for-linus' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fix from Sage Weil: "This just fixes a few remaining memory allocations in RBD to use GFP_NOIO instead of GFP_ATOMIC" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: use GFP_NOIO consistently for request allocations
2016-04-05rbd: use GFP_NOIO consistently for request allocationsDavid Disseldorp1-3/+3
As of 5a60e87603c4c533492c515b7f62578189b03c9c, RBD object request allocations are made via rbd_obj_request_create() with GFP_NOIO. However, subsequent OSD request allocations in rbd_osd_req_create*() use GFP_ATOMIC. With heavy page cache usage (e.g. OSDs running on same host as krbd client), rbd_osd_req_create() order-1 GFP_ATOMIC allocations have been observed to fail, where direct reclaim would have allowed GFP_NOIO allocations to succeed. Cc: stable@vger.kernel.org # 3.18+ Suggested-by: Vlastimil Babka <vbabka@suse.cz> Suggested-by: Neil Brown <neilb@suse.com> Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-04-04mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usageKirill A. Shutemov1-2/+2
Mostly direct substitution with occasional adjustment or removing outdated comments. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macrosKirill A. Shutemov3-3/+3
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-26Merge branch 'for-linus' of ↵Linus Torvalds1-11/+3
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There is quite a bit here, including some overdue refactoring and cleanup on the mon_client and osd_client code from Ilya, scattered writeback support for CephFS and a pile of bug fixes from Zheng, and a few random cleanups and fixes from others" [ I already decided not to pull this because of it having been rebased recently, but ended up changing my mind after all. Next time I'll really hold people to it. Oh well. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits) libceph: use KMEM_CACHE macro ceph: use kmem_cache_zalloc rbd: use KMEM_CACHE macro ceph: use lookup request to revalidate dentry ceph: kill ceph_get_dentry_parent_inode() ceph: fix security xattr deadlock ceph: don't request vxattrs from MDS ceph: fix mounting same fs multiple times ceph: remove unnecessary NULL check ceph: avoid updating directory inode's i_size accidentally ceph: fix race during filling readdir cache libceph: use sizeof_footer() more ceph: kill ceph_empty_snapc ceph: fix a wrong comparison ceph: replace CURRENT_TIME by current_fs_time() ceph: scattered page writeback libceph: add helper that duplicates last extent operation libceph: enable large, variable-sized OSD requests libceph: osdc->req_mempool should be backed by a slab pool libceph: make r_request msg_size calculation clearer ...
2016-03-25rbd: use KMEM_CACHE macroGeliang Tang1-8/+2
Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25libceph: enable large, variable-sized OSD requestsIlya Dryomov1-2/+0
Turn r_ops into a flexible array member to enable large, consisting of up to 16 ops, OSD requests. The use case is scattered writeback in cephfs and, as far as the kernel client is concerned, 16 is just a made up number. r_ops had size 3 for copyup+hint+write, but copyup is really a special case - it can only happen once. ceph_osd_request_cache is therefore stuffed with num_ops=2 requests, anything bigger than that is allocated with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which means either num_ops=1 or num_ops=2 for use_mempool=true - all existing users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with that. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25libceph: move r_reply_op_{len,result} into struct ceph_osd_req_opYan, Zheng1-1/+1
This avoids defining large array of r_reply_op_{len,result} in in struct ceph_osd_request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-24Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2-3/+4
Pull block fixes from Jens Axboe: "Final round of fixes for this merge window - some of this has come up after the initial pull request, and some of it was put in a post-merge branch before the merge window. This contains: - Fix for a bad check for an error on dma mapping in the mtip32xx driver, from Alexey Khoroshilov. - A set of fixes for lightnvm, from Javier, Matias, and Wenwei. - An NVMe completion record corruption fix from Marta, ensuring that we read things in the right order. - Two writeback fixes from Tejun, marked for stable@ as well. - A blk-mq sw queue iterator fix from Thomas, fixing an oops for sparse CPU maps. They hit this in the hot plug/unplug rework" * 'for-linus' of git://git.kernel.dk/linux-block: nvme: avoid cqe corruption when update at the same time as read writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list() blk-mq: Use proper cpumask iterator mtip32xx: fix checks for dma mapping errors lightnvm: do not load L2P table if not supported lightnvm: do not reserve lun on l2p loading nvme: lightnvm: return ppa completion status lightnvm: add a bitmap of luns lightnvm: specify target's logical address area null_blk: add lightnvm null_blk device to the nullb_list
2016-03-20Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-3/+8
Pull virtio/vhost updates from Michael Tsirkin: "New features, performance improvements, cleanups: - basic polling support for vhost - rework virtio to optionally use DMA API, fixing it on Xen - balloon stats gained a new entry - using the new napi_alloc_skb speeds up virtio net - virtio blk stats can now be read while another VCPU is busy inflating or deflating the balloon plus misc cleanups in various places" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_net: replace netdev_alloc_skb_ip_align() with napi_alloc_skb() vhost_net: basic polling support vhost: introduce vhost_vq_avail_empty() vhost: introduce vhost_has_work() virtio_balloon: Allow to resize and update the balloon stats in parallel virtio_balloon: Use a workqueue instead of "vballoon" kthread virtio/s390: size of SET_IND payload virtio/s390: use dev_to_virtio vhost: rename vhost_init_used() vhost: rename cross-endian helpers virtio_blk: VIRTIO_BLK_F_WCE->VIRTIO_BLK_F_FLUSH vring: Use the DMA API on Xen virtio_pci: Use the DMA API if enabled virtio_mmio: Use the DMA API if enabled virtio: Add improved queue allocation API virtio_ring: Support DMA APIs vring: Introduce vring_use_dma_api() s390/dma: Allow per device dma ops alpha/dma: use common noop dma ops dma: Provide simple noop dma ops
2016-03-18Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-2/+2
Merge second patch-bomb from Andrew Morton: - a couple of hotfixes - the rest of MM - a new timer slack control in procfs - a couple of procfs fixes - a few misc things - some printk tweaks - lib/ updates, notably to radix-tree. - add my and Nick Piggin's old userspace radix-tree test harness to tools/testing/radix-tree/. Matthew said it was a godsend during the radix-tree work he did. - a few code-size improvements, switching to __always_inline where gcc screwed up. - partially implement character sets in sscanf * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) sscanf: implement basic character sets lib/bug.c: use common WARN helper param: convert some "on"/"off" users to strtobool lib: add "on"/"off" support to kstrtobool lib: update single-char callers of strtobool() lib: move strtobool() to kstrtobool() include/linux/unaligned: force inlining of byteswap operations include/uapi/linux/byteorder, swab: force inlining of some byteswap operations include/asm-generic/atomic-long.h: force inlining of some atomic_long operations usb: common: convert to use match_string() helper ide: hpt366: convert to use match_string() helper ata: hpt366: convert to use match_string() helper power: ab8500: convert to use match_string() helper power: charger_manager: convert to use match_string() helper drm/edid: convert to use match_string() helper pinctrl: convert to use match_string() helper device property: convert to use match_string() helper lib/string: introduce match_string() helper radix-tree tests: add test for radix_tree_iter_next radix-tree tests: add regression3 test ...
2016-03-18mtip32xx: fix checks for dma mapping errorsAlexey Khoroshilov1-2/+2
exec_drive_taskfile() checks for dma mapping errors by comparison returned address with zero, while pci_dma_mapping_error() should be used. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-18null_blk: add lightnvm null_blk device to the nullb_listWenwei Tao1-1/+2
After register null_blk devices into lightnvm, we forget to add these devices to the the nullb_list, makes them invisible to the null_blk driver. Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com> Fixes: a514379b0c77 ("null_blk: oops when initializing without lightnvm") Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-18Merge branch 'for-4.6/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds12-2628/+404
Pull block driver updates from Jens Axboe: "This is the block driver pull request for this merge window. It sits on top of for-4.6/core, that was just sent out. This contains: - A set of fixes for lightnvm. One from Alan, fixing an overflow, and the rest from the usual suspects, Javier and Matias. - A set of fixes for nbd from Markus and Dan, and a fixup from Arnd for correct usage of the signed 64-bit divider. - A set of bug fixes for the Micron mtip32xx, from Asai. - A fix for the brd discard handling from Bart. - Update the maintainers entry for cciss, since that hardware has transferred ownership. - Three bug fixes for bcache from Eric Wheeler. - Set of fixes for xen-blk{back,front} from Jan and Konrad. - Removal of the cpqarray driver. It has been disabled in Kconfig since 2013, and we were initially scheduled to remove it in 3.15. - Various updates and fixes for NVMe, with the most important being: - Removal of the per-device NVMe thread, replacing that with a watchdog timer instead. From Christoph. - Exposing the namespace WWID through sysfs, from Keith. - Set of cleanups from Ming Lin. - Logging the controller device name instead of the underlying PCI device name, from Sagi. - And a bunch of fixes and optimizations from the usual suspects in this area" * 'for-4.6/drivers' of git://git.kernel.dk/linux-block: (49 commits) NVMe: Expose ns wwid through single sysfs entry drivers:block: cpqarray clean up brd: Fix discard request processing cpqarray: remove it from the kernel cciss: update MAINTAINERS NVMe: Remove unused sq_head read in completion path bcache: fix cache_set_flush() NULL pointer dereference on OOM bcache: cleaned up error handling around register_cache() bcache: fix race of writeback thread starting before complete initialization NVMe: Create discard zero quirk white list nbd: use correct div_s64 helper mtip32xx: remove unneeded variable in mtip_cmd_timeout() lightnvm: generalize rrpc ppa calculations lightnvm: remove struct nvm_dev->total_blocks lightnvm: rename ->nr_pages to ->nr_sects lightnvm: update closed list outside of intr context xen/blback: Fit the important information of the thread in 17 characters lightnvm: fold get bb tbl when using dual/quad plane mode lightnvm: fix up nonsensical configure overrun checking xen-blkback: advertise indirect segment support earlier ...
2016-03-17mm: introduce page reference manipulation functionsJoonsoo Kim1-2/+2
The success of CMA allocation largely depends on the success of migration and key factor of it is page reference count. Until now, page reference is manipulated by direct calling atomic functions so we cannot follow up who and where manipulate it. Then, it is hard to find actual reason of CMA allocation failure. CMA allocation should be guaranteed to succeed so finding offending place is really important. In this patch, call sites where page reference is manipulated are converted to introduced wrapper function. This is preparation step to add tracepoint to each page reference manipulation function. With this facility, we can easily find reason of CMA allocation failure. There is no functional change in this patch. In addition, this patch also converts reference read sites. It will help a second step that renames page._count to something else and prevents later attempt to direct access to it (Suggested by Andrew). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17Merge branch 'linus' of ↵Linus Torvalds6-110/+128
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "Here is the crypto update for 4.6: API: - Convert remaining crypto_hash users to shash or ahash, also convert blkcipher/ablkcipher users to skcipher. - Remove crypto_hash interface. - Remove crypto_pcomp interface. - Add crypto engine for async cipher drivers. - Add akcipher documentation. - Add skcipher documentation. Algorithms: - Rename crypto/crc32 to avoid name clash with lib/crc32. - Fix bug in keywrap where we zero the wrong pointer. Drivers: - Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver. - Add PIC32 hwrng driver. - Support BCM6368 in bcm63xx hwrng driver. - Pack structs for 32-bit compat users in qat. - Use crypto engine in omap-aes. - Add support for sama5d2x SoCs in atmel-sha. - Make atmel-sha available again. - Make sahara hashing available again. - Make ccp hashing available again. - Make sha1-mb available again. - Add support for multiple devices in ccp. - Improve DMA performance in caam. - Add hashing support to rockchip" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits) crypto: qat - remove redundant arbiter configuration crypto: ux500 - fix checks of error code returned by devm_ioremap_resource() crypto: atmel - fix checks of error code returned by devm_ioremap_resource() crypto: qat - Change the definition of icp_qat_uof_regtype hwrng: exynos - use __maybe_unused to hide pm functions crypto: ccp - Add abstraction for device-specific calls crypto: ccp - CCP versioning support crypto: ccp - Support for multiple CCPs crypto: ccp - Remove check for x86 family and model crypto: ccp - memset request context to zero during import lib/mpi: use "static inline" instead of "extern inline" lib/mpi: avoid assembler warning hwrng: bcm63xx - fix non device tree compatibility crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode. crypto: qat - The AE id should be less than the maximal AE number lib/mpi: Endianness fix crypto: rockchip - add hash support for crypto engine in rk3288 crypto: xts - fix compile errors crypto: doc - add skcipher API documentation crypto: doc - update AEAD AD handling ...
2016-03-15paride: make 'verbose' parameter an 'int' againArnd Bergmann2-4/+4
gcc-6.0 found an ancient bug in the paride driver, which had a "module_param(verbose, bool, 0);" since before 2.6.12, but actually uses it to accept '0', '1' or '2' as arguments: drivers/block/paride/pd.c: In function 'pd_init_dev_parms': drivers/block/paride/pd.c:298:29: warning: comparison of constant '1' with boolean expression is always false [-Wbool-compare] #define DBMSG(msg) ((verbose>1)?(msg):NULL) In 2012, Rusty did a cleanup patch that also changed the type of the variable to 'bool', which introduced what is now a gcc warning. This changes the type back to 'int' and adapts the module_param() line instead, so it should work as documented in case anyone ever cares about running the ancient driver with debugging. Fixes: 90ab5ee94171 ("module_param: make bool parameters really bool (drivers & misc)") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Rusty Russell <rusty@rustcorp.com.au> Cc: Tim Waugh <tim@cyberelk.net> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Cc: Jens Axboe <axboe@fb.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15drivers:block: cpqarray clean upValentin Rothberg1-1/+0
Commit d436641439e0 ("cpqarray: remove it from the kernel") removes the Kconfig option BLK_CPQ_DA and cpqarray. Remove the dead build rule in the Makefile. Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-15brd: Fix discard request processingBart Van Assche1-1/+1
Avoid that discard requests with size => PAGE_SIZE fail with -EIO. Refuse discard requests if the discard size is not a multiple of the page size. Fixes: 2dbe54957636 ("brd: Refuse improperly aligned discard requests") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Jan Kara <jack@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robert Elliot <elliott@hp.com> Cc: stable <stable@vger.kernel.org> # v4.4+ Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-14cpqarray: remove it from the kernelJens Axboe5-2392/+0
We disabled the ability to enable this driver back in October of 2013, we should be able to safely remove it at this point. The initial goal was to remove it in 3.15, so now is the time. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-04nbd: use correct div_s64 helperArnd Bergmann1-2/+1
The do_div() macro now checks its arguments for the correct type, and refuses anything other than u64, so we get a warning about nbd_ioctl passing in an loff_t: drivers/block/nbd.c: In function '__nbd_ioctl': drivers/block/nbd.c:757:77: error: comparison of distinct pointer types lacks a cast [-Werror] This changes the nbd code to use div_s64() instead, which takes a signed argument. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 37091fdd831f ("nbd: Create size change events for userspace") Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-04mtip32xx: remove unneeded variable in mtip_cmd_timeout()Jens Axboe1-2/+1
We always return BLK_EH_RESET_TIMER, so no point in storing that in an integer. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03xen/blback: Fit the important information of the thread in 17 charactersKonrad Rzeszutek Wilk1-4/+3
The processes names are truncated to 17, while we had the length of the process as name 20 - which meant that while we filled it out with various details - the last 3 characters (which had the queue number) never surfaced to the user-space. To simplify this and be able to fit the device name, domain id, and the queue number we remove the 'blkback' from the name. Prior to this patch the device name is "blkback.<domid>.<name>" for example: blkback.8.xvda, blkback.11.hda. With the multiqueue block backend we add "-%d" for the queue. But sadly this is already way past the limit so it gets stripped. Possible solution had been identified by Ian: http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg03516.html " If you are pressed for space then the "xvd" is probably a bit redundant in a string which starts blkbk. The guest may not even call the device xvdN (iirc BSD has another prefix) any how, so having blkback say so seems of limited use anyway. Since this seems to not include a partition number how does this work in the split partition scheme? (i.e. one where the guest is given xvda1 and xvda2 rather than xvda with a partition table) [It will be 'blkback.8.xvda1', and 'blkback.11.xvda2'] Perhaps something derived from one of the schemes in http://xenbits.xen.org/docs/unstable/misc/vbd-interface.txt might be a better fit? After a bit of discussion (see http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg01588.html) we settled on dropping the "blback" part. This will make it possible to have the <domid>.<name>-<queue>: [1.xvda-0] [1.xvda-1] And we enough space to make it go up to: [32100.xvdfg9-5] Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-03-03xen-blkback: advertise indirect segment support earlierJan Beulich1-5/+8
There's no reason to defer this until the connect phase, and in fact there are frontend implementations expecting this to be available earlier. Move it into the probe function. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>