Documentation/admin-guide/device-mapper/dm-integrity.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259

============
dm-integrity
============

The dm-integrity target emulates a block device that has additional
per-sector tags that can be used for storing integrity information.

A general problem with storing integrity tags with every sector is that
writing the sector and the integrity tag must be atomic - i.e. in case of
crash, either both sector and integrity tag or none of them is written.

To guarantee write atomicity, the dm-integrity target uses journal, it
writes sector data and integrity tags into a journal, commits the journal
and then copies the data and integrity tags to their respective location.

The dm-integrity target can be used with the dm-crypt target - in this
situation the dm-crypt target creates the integrity data and passes them
to the dm-integrity target via bio_integrity_payload attached to the bio.
In this mode, the dm-crypt and dm-integrity targets provide authenticated
disk encryption - if the attacker modifies the encrypted device, an I/O
error is returned instead of random data.

The dm-integrity target can also be used as a standalone target, in this
mode it calculates and verifies the integrity tag internally. In this
mode, the dm-integrity target can be used to detect silent data
corruption on the disk or in the I/O path.

There's an alternate mode of operation where dm-integrity uses bitmap
instead of a journal. If a bit in the bitmap is 1, the corresponding
region's data and integrity tags are not synchronized - if the machine
crashes, the unsynchronized regions will be recalculated. The bitmap mode
is faster than the journal mode, because we don't have to write the data
twice, but it is also less reliable, because if data corruption happens
when the machine crashes, it may not be detected.

When loading the target for the first time, the kernel driver will format
the device. But it will only format the device if the superblock contains
zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
target can't be loaded.

To use the target for the first time:

1. overwrite the superblock with zeroes
2. load the dm-integrity target with one-sector size, the kernel driver
   will format the device
3. unload the dm-integrity target
4. read the "provided_data_sectors" value from the superblock
5. load the dm-integrity target with the the target size
   "provided_data_sectors"
6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
   with the size "provided_data_sectors"


Target arguments:

1. the underlying block device

2. the number of reserved sector at the beginning of the device - the
   dm-integrity won't read of write these sectors

3. the size of the integrity tag (if "-" is used, the size is taken from
   the internal-hash algorithm)

4. mode:

	D - direct writes (without journal)
		in this mode, journaling is
		not used and data sectors and integrity tags are written
		separately. In case of crash, it is possible that the data
		and integrity tag doesn't match.
	J - journaled writes
		data and integrity tags are written to the
		journal and atomicity is guaranteed. In case of crash,
		either both data and tag or none of them are written. The
		journaled mode degrades write throughput twice because the
		data have to be written twice.
	B - bitmap mode - data and metadata are written without any
		synchronization, the driver maintains a bitmap of dirty
		regions where data and metadata don't match. This mode can
		only be used with internal hash.
	R - recovery mode - in this mode, journal is not replayed,
		checksums are not checked and writes to the device are not
		allowed. This mode is useful for data recovery if the
		device cannot be activated in any of the other standard
		modes.

5. the number of additional arguments

Additional arguments:

journal_sectors:number
	The size of journal, this argument is used only if formatting the
	device. If the device is already formatted, the value from the
	superblock is used.

interleave_sectors:number
	The number of interleaved sectors. This values is rounded down to
	a power of two. If the device is already formatted, the value from
	the superblock is used.

meta_device:device
	Don't interleave the data and metadata on on device. Use a
	separate device for metadata.

buffer_sectors:number
	The number of sectors in one buffer. The value is rounded down to
	a power of two.

	The tag area is accessed using buffers, the buffer size is
	configurable. The large buffer size means that the I/O size will
	be larger, but there could be less I/Os issued.

journal_watermark:number
	The journal watermark in percents. When the size of the journal
	exceeds this watermark, the thread that flushes the journal will
	be started.

commit_time:number
	Commit time in milliseconds. When this time passes, the journal is
	written. The journal is also written immediatelly if the FLUSH
	request is received.

internal_hash:algorithm(:key)	(the key is optional)
	Use internal hash or crc.
	When this argument is used, the dm-integrity target won't accept
	integrity tags from the upper target, but it will automatically
	generate and verify the integrity tags.

	You can use a crc algorithm (such as crc32), then integrity target
	will protect the data against accidental corruption.
	You can also use a hmac algorithm (for example
	"hmac(sha256):0123456789abcdef"), in this mode it will provide
	cryptographic authentication of the data without encryption.

	When this argument is not used, the integrity tags are accepted
	from an upper layer target, such as dm-crypt. The upper layer
	target should check the validity of the integrity tags.

recalculate
	Recalculate the integrity tags automatically. It is only valid
	when using internal hash.

journal_crypt:algorithm(:key)	(the key is optional)
	Encrypt the journal using given algorithm to make sure that the
	attacker can't read the journal. You can use a block cipher here
	(such as "cbc(aes)") or a stream cipher (for example "chacha20",
	"salsa20", "ctr(aes)" or "ecb(arc4)").

	The journal contains history of last writes to the block device,
	an attacker reading the journal could see the last sector nubmers
	that were written. From the sector numbers, the attacker can infer
	the size of files that were written. To protect against this
	situation, you can encrypt the journal.

journal_mac:algorithm(:key)	(the key is optional)
	Protect sector numbers in the journal from accidental or malicious
	modification. To protect against accidental modification, use a
	crc algorithm, to protect against malicious modification, use a
	hmac algorithm with a key.

	This option is not needed when using internal-hash because in this
	mode, the integrity of journal entries is checked when replaying
	the journal. Thus, modified sector number would be detected at
	this stage.

block_size:number
	The size of a data block in bytes.  The larger the block size the
	less overhead there is for per-block integrity metadata.
	Supported values are 512, 1024, 2048 and 4096 bytes.  If not
	specified the default block size is 512 bytes.

sectors_per_bit:number
	In the bitmap mode, this parameter specifies the number of
	512-byte sectors that corresponds to one bitmap bit.

bitmap_flush_interval:number
	The bitmap flush interval in milliseconds. The metadata buffers
	are synchronized when this interval expires.


The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can
be changed when reloading the target (load an inactive table and swap the
tables with suspend and resume). The other arguments should not be changed
when reloading the target because the layout of disk data depend on them
and the reloaded target would be non-functional.


The layout of the formatted block device:

* reserved sectors
    (they are not used by this target, they can be used for
    storing LUKS metadata or for other purpose), the size of the reserved
    area is specified in the target arguments

* superblock (4kiB)
	* magic string - identifies that the device was formatted
	* version
	* log2(interleave sectors)
	* integrity tag size
	* the number of journal sections
	* provided data sectors - the number of sectors that this target
	  provides (i.e. the size of the device minus the size of all
	  metadata and padding). The user of this target should not send
	  bios that access data beyond the "provided data sectors" limit.
	* flags
	    SB_FLAG_HAVE_JOURNAL_MAC
		- a flag is set if journal_mac is used
	    SB_FLAG_RECALCULATING
		- recalculating is in progress
	    SB_FLAG_DIRTY_BITMAP
		- journal area contains the bitmap of dirty
		  blocks
	* log2(sectors per block)
	* a position where recalculating finished
* journal
	The journal is divided into sections, each section contains:

	* metadata area (4kiB), it contains journal entries

	  - every journal entry contains:

		* logical sector (specifies where the data and tag should
		  be written)
		* last 8 bytes of data
		* integrity tag (the size is specified in the superblock)

	  - every metadata sector ends with

		* mac (8-bytes), all the macs in 8 metadata sectors form a
		  64-byte value. It is used to store hmac of sector
		  numbers in the journal section, to protect against a
		  possibility that the attacker tampers with sector
		  numbers in the journal.
		* commit id

	* data area (the size is variable; it depends on how many journal
	  entries fit into the metadata area)

	    - every sector in the data area contains:

		* data (504 bytes of data, the last 8 bytes are stored in
		  the journal entry)
		* commit id

	To test if the whole journal section was written correctly, every
	512-byte sector of the journal ends with 8-byte commit id. If the
	commit id matches on all sectors in a journal section, then it is
	assumed that the section was written correctly. If the commit id
	doesn't match, the section was written partially and it should not
	be replayed.

* one or more runs of interleaved tags and data.
    Each run contains:

	* tag area - it contains integrity tags. There is one tag for each
	  sector in the data area
	* data area - it contains data sectors. The number of data sectors
	  in one run must be a power of two. log2 of this value is stored
	  in the superblock.