summaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/autofs.rst (renamed from Documentation/filesystems/autofs.txt)263
-rw-r--r--Documentation/filesystems/debugfs.txt50
-rw-r--r--Documentation/filesystems/erofs.txt27
-rw-r--r--Documentation/filesystems/f2fs.txt5
-rw-r--r--Documentation/filesystems/fscrypt.rst72
-rw-r--r--Documentation/filesystems/fsverity.rst12
-rw-r--r--Documentation/filesystems/index.rst1
-rw-r--r--Documentation/filesystems/locking.rst2
-rw-r--r--Documentation/filesystems/overlayfs.rst (renamed from Documentation/filesystems/overlayfs.txt)10
9 files changed, 253 insertions, 189 deletions
diff --git a/Documentation/filesystems/autofs.txt b/Documentation/filesystems/autofs.rst
index 3af38c7fd26d..681c6a492bc0 100644
--- a/Documentation/filesystems/autofs.txt
+++ b/Documentation/filesystems/autofs.rst
@@ -1,12 +1,9 @@
-<head>
-<style> p { max-width:50em} ol, ul {max-width: 40em}</style>
-</head>
-
+=====================
autofs - how it works
=====================
Purpose
--------
+=======
The goal of autofs is to provide on-demand mounting and race free
automatic unmounting of various other filesystems. This provides two
@@ -28,7 +25,7 @@ key advantages:
first accessed a name.
Context
--------
+=======
The "autofs" filesystem module is only one part of an autofs system.
There also needs to be a user-space program which looks up names
@@ -43,7 +40,7 @@ filesystem type. Several "autofs" filesystems can be mounted and they
can each be managed separately, or all managed by the same daemon.
Content
--------
+=======
An autofs filesystem can contain 3 sorts of objects: directories,
symbolic links and mount traps. Mount traps are directories with
@@ -52,9 +49,10 @@ extra properties as described in the next section.
Objects can only be created by the automount daemon: symlinks are
created with a regular `symlink` system call, while directories and
mount traps are created with `mkdir`. The determination of whether a
-directory should be a mount trap or not is quite _ad hoc_, largely for
-historical reasons, and is determined in part by the
-*direct*/*indirect*/*offset* mount options, and the *maxproto* mount option.
+directory should be a mount trap is based on a master map. This master
+map is consulted by autofs to determine which directories are mount
+points. Mount points can be *direct*/*indirect*/*offset*.
+On most systems, the default master map is located at */etc/auto.master*.
If neither the *direct* or *offset* mount options are given (so the
mount is considered to be *indirect*), then the root directory is
@@ -80,7 +78,7 @@ where in the tree they are (root, top level, or lower), the *maxproto*,
and whether the mount was *indirect* or not.
Mount Traps
----------------
+===========
A core element of the implementation of autofs is the Mount Traps
which are provided by the Linux VFS. Any directory provided by a
@@ -201,7 +199,7 @@ initiated or is being considered, otherwise it returns 0.
Mountpoint expiry
------------------
+=================
The VFS has a mechanism for automatically expiring unused mounts,
much as it can expire any unused dentry information from the dcache.
@@ -301,7 +299,7 @@ completed (together with removing any directories that might have been
necessary), or has been aborted.
Communicating with autofs: detecting the daemon
------------------------------------------------
+===============================================
There are several forms of communication between the automount daemon
and the filesystem. As we have already seen, the daemon can create and
@@ -317,33 +315,39 @@ If the daemon ever has to be stopped and restarted a new pgid can be
provided through an ioctl as will be described below.
Communicating with autofs: the event pipe
------------------------------------------
+=========================================
When an autofs filesystem is mounted, the 'write' end of a pipe must
be passed using the 'fd=' mount option. autofs will write
notification messages to this pipe for the daemon to respond to.
-For version 5, the format of the message is:
-
- struct autofs_v5_packet {
- int proto_version; /* Protocol version */
- int type; /* Type of packet */
- autofs_wqt_t wait_queue_token;
- __u32 dev;
- __u64 ino;
- __u32 uid;
- __u32 gid;
- __u32 pid;
- __u32 tgid;
- __u32 len;
- char name[NAME_MAX+1];
+For version 5, the format of the message is::
+
+ struct autofs_v5_packet {
+ struct autofs_packet_hdr hdr;
+ autofs_wqt_t wait_queue_token;
+ __u32 dev;
+ __u64 ino;
+ __u32 uid;
+ __u32 gid;
+ __u32 pid;
+ __u32 tgid;
+ __u32 len;
+ char name[NAME_MAX+1];
};
-where the type is one of
+And the format of the header is::
+
+ struct autofs_packet_hdr {
+ int proto_version; /* Protocol version */
+ int type; /* Type of packet */
+ };
- autofs_ptype_missing_indirect
- autofs_ptype_expire_indirect
- autofs_ptype_missing_direct
- autofs_ptype_expire_direct
+where the type is one of ::
+
+ autofs_ptype_missing_indirect
+ autofs_ptype_expire_indirect
+ autofs_ptype_missing_direct
+ autofs_ptype_expire_direct
so messages can indicate that a name is missing (something tried to
access it but it isn't there) or that it has been selected for expiry.
@@ -360,7 +364,7 @@ acknowledged using one of the ioctls below with the relevant
`wait_queue_token`.
Communicating with autofs: root directory ioctls
-------------------------------------------------
+================================================
The root directory of an autofs filesystem will respond to a number of
ioctls. The process issuing the ioctl must have the CAP_SYS_ADMIN
@@ -368,58 +372,66 @@ capability, or must be the automount daemon.
The available ioctl commands are:
-- **AUTOFS_IOC_READY**: a notification has been handled. The argument
- to the ioctl command is the "wait_queue_token" number
- corresponding to the notification being acknowledged.
-- **AUTOFS_IOC_FAIL**: similar to above, but indicates failure with
- the error code `ENOENT`.
-- **AUTOFS_IOC_CATATONIC**: Causes the autofs to enter "catatonic"
- mode meaning that it stops sending notifications to the daemon.
- This mode is also entered if a write to the pipe fails.
-- **AUTOFS_IOC_PROTOVER**: This returns the protocol version in use.
-- **AUTOFS_IOC_PROTOSUBVER**: Returns the protocol sub-version which
- is really a version number for the implementation.
-- **AUTOFS_IOC_SETTIMEOUT**: This passes a pointer to an unsigned
- long. The value is used to set the timeout for expiry, and
- the current timeout value is stored back through the pointer.
-- **AUTOFS_IOC_ASKUMOUNT**: Returns, in the pointed-to `int`, 1 if
- the filesystem could be unmounted. This is only a hint as
- the situation could change at any instant. This call can be
- used to avoid a more expensive full unmount attempt.
-- **AUTOFS_IOC_EXPIRE**: as described above, this asks if there is
- anything suitable to expire. A pointer to a packet:
-
- struct autofs_packet_expire_multi {
- int proto_version; /* Protocol version */
- int type; /* Type of packet */
- autofs_wqt_t wait_queue_token;
- int len;
- char name[NAME_MAX+1];
- };
+- **AUTOFS_IOC_READY**:
+ a notification has been handled. The argument
+ to the ioctl command is the "wait_queue_token" number
+ corresponding to the notification being acknowledged.
+- **AUTOFS_IOC_FAIL**:
+ similar to above, but indicates failure with
+ the error code `ENOENT`.
+- **AUTOFS_IOC_CATATONIC**:
+ Causes the autofs to enter "catatonic"
+ mode meaning that it stops sending notifications to the daemon.
+ This mode is also entered if a write to the pipe fails.
+- **AUTOFS_IOC_PROTOVER**:
+ This returns the protocol version in use.
+- **AUTOFS_IOC_PROTOSUBVER**:
+ Returns the protocol sub-version which
+ is really a version number for the implementation.
+- **AUTOFS_IOC_SETTIMEOUT**:
+ This passes a pointer to an unsigned
+ long. The value is used to set the timeout for expiry, and
+ the current timeout value is stored back through the pointer.
+- **AUTOFS_IOC_ASKUMOUNT**:
+ Returns, in the pointed-to `int`, 1 if
+ the filesystem could be unmounted. This is only a hint as
+ the situation could change at any instant. This call can be
+ used to avoid a more expensive full unmount attempt.
+- **AUTOFS_IOC_EXPIRE**:
+ as described above, this asks if there is
+ anything suitable to expire. A pointer to a packet::
+
+ struct autofs_packet_expire_multi {
+ struct autofs_packet_hdr hdr;
+ autofs_wqt_t wait_queue_token;
+ int len;
+ char name[NAME_MAX+1];
+ };
- is required. This is filled in with the name of something
- that can be unmounted or removed. If nothing can be expired,
- `errno` is set to `EAGAIN`. Even though a `wait_queue_token`
- is present in the structure, no "wait queue" is established
- and no acknowledgment is needed.
-- **AUTOFS_IOC_EXPIRE_MULTI**: This is similar to
- **AUTOFS_IOC_EXPIRE** except that it causes notification to be
- sent to the daemon, and it blocks until the daemon acknowledges.
- The argument is an integer which can contain two different flags.
+ is required. This is filled in with the name of something
+ that can be unmounted or removed. If nothing can be expired,
+ `errno` is set to `EAGAIN`. Even though a `wait_queue_token`
+ is present in the structure, no "wait queue" is established
+ and no acknowledgment is needed.
+- **AUTOFS_IOC_EXPIRE_MULTI**:
+ This is similar to
+ **AUTOFS_IOC_EXPIRE** except that it causes notification to be
+ sent to the daemon, and it blocks until the daemon acknowledges.
+ The argument is an integer which can contain two different flags.
- **AUTOFS_EXP_IMMEDIATE** causes `last_used` time to be ignored
- and objects are expired if the are not in use.
+ **AUTOFS_EXP_IMMEDIATE** causes `last_used` time to be ignored
+ and objects are expired if the are not in use.
- **AUTOFS_EXP_FORCED** causes the in use status to be ignored
- and objects are expired ieven if they are in use. This assumes
- that the daemon has requested this because it is capable of
- performing the umount.
+ **AUTOFS_EXP_FORCED** causes the in use status to be ignored
+ and objects are expired ieven if they are in use. This assumes
+ that the daemon has requested this because it is capable of
+ performing the umount.
- **AUTOFS_EXP_LEAVES** will select a leaf rather than a top-level
- name to expire. This is only safe when *maxproto* is 4.
+ **AUTOFS_EXP_LEAVES** will select a leaf rather than a top-level
+ name to expire. This is only safe when *maxproto* is 4.
Communicating with autofs: char-device ioctls
----------------------------------------------
+=============================================
It is not always possible to open the root of an autofs filesystem,
particularly a *direct* mounted filesystem. If the automount daemon
@@ -429,9 +441,9 @@ need there is a "miscellaneous" character device (major 10, minor 235)
which can be used to communicate directly with the autofs filesystem.
It requires CAP_SYS_ADMIN for access.
-The `ioctl`s that can be used on this device are described in a separate
+The 'ioctl's that can be used on this device are described in a separate
document `autofs-mount-control.txt`, and are summarised briefly here.
-Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure:
+Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure::
struct autofs_dev_ioctl {
__u32 ver_major;
@@ -469,41 +481,50 @@ that the kernel module can support.
Commands are:
-- **AUTOFS_DEV_IOCTL_VERSION_CMD**: does nothing, except validate and
- set version numbers.
-- **AUTOFS_DEV_IOCTL_OPENMOUNT_CMD**: return an open file descriptor
- on the root of an autofs filesystem. The filesystem is identified
- by name and device number, which is stored in `openmount.devid`.
- Device numbers for existing filesystems can be found in
- `/proc/self/mountinfo`.
-- **AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD**: same as `close(ioctlfd)`.
-- **AUTOFS_DEV_IOCTL_SETPIPEFD_CMD**: if the filesystem is in
- catatonic mode, this can provide the write end of a new pipe
- in `setpipefd.pipefd` to re-establish communication with a daemon.
- The process group of the calling process is used to identify the
- daemon.
-- **AUTOFS_DEV_IOCTL_REQUESTER_CMD**: `path` should be a
- name within the filesystem that has been auto-mounted on.
- On successful return, `requester.uid` and `requester.gid` will be
- the UID and GID of the process which triggered that mount.
-- **AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD**: Check if path is a
- mountpoint of a particular type - see separate documentation for
- details.
-- **AUTOFS_DEV_IOCTL_PROTOVER_CMD**:
-- **AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD**:
-- **AUTOFS_DEV_IOCTL_READY_CMD**:
-- **AUTOFS_DEV_IOCTL_FAIL_CMD**:
-- **AUTOFS_DEV_IOCTL_CATATONIC_CMD**:
-- **AUTOFS_DEV_IOCTL_TIMEOUT_CMD**:
-- **AUTOFS_DEV_IOCTL_EXPIRE_CMD**:
-- **AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD**: These all have the same
- function as the similarly named **AUTOFS_IOC** ioctls, except
- that **FAIL** can be given an explicit error number in `fail.status`
- instead of assuming `ENOENT`, and this **EXPIRE** command
- corresponds to **AUTOFS_IOC_EXPIRE_MULTI**.
+- **AUTOFS_DEV_IOCTL_VERSION_CMD**:
+ does nothing, except validate and
+ set version numbers.
+- **AUTOFS_DEV_IOCTL_OPENMOUNT_CMD**:
+ return an open file descriptor
+ on the root of an autofs filesystem. The filesystem is identified
+ by name and device number, which is stored in `openmount.devid`.
+ Device numbers for existing filesystems can be found in
+ `/proc/self/mountinfo`.
+- **AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD**:
+ same as `close(ioctlfd)`.
+- **AUTOFS_DEV_IOCTL_SETPIPEFD_CMD**:
+ if the filesystem is in
+ catatonic mode, this can provide the write end of a new pipe
+ in `setpipefd.pipefd` to re-establish communication with a daemon.
+ The process group of the calling process is used to identify the
+ daemon.
+- **AUTOFS_DEV_IOCTL_REQUESTER_CMD**:
+ `path` should be a
+ name within the filesystem that has been auto-mounted on.
+ On successful return, `requester.uid` and `requester.gid` will be
+ the UID and GID of the process which triggered that mount.
+- **AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD**:
+ Check if path is a
+ mountpoint of a particular type - see separate documentation for
+ details.
+
+- **AUTOFS_DEV_IOCTL_PROTOVER_CMD**
+- **AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD**
+- **AUTOFS_DEV_IOCTL_READY_CMD**
+- **AUTOFS_DEV_IOCTL_FAIL_CMD**
+- **AUTOFS_DEV_IOCTL_CATATONIC_CMD**
+- **AUTOFS_DEV_IOCTL_TIMEOUT_CMD**
+- **AUTOFS_DEV_IOCTL_EXPIRE_CMD**
+- **AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD**
+
+These all have the same
+function as the similarly named **AUTOFS_IOC** ioctls, except
+that **FAIL** can be given an explicit error number in `fail.status`
+instead of assuming `ENOENT`, and this **EXPIRE** command
+corresponds to **AUTOFS_IOC_EXPIRE_MULTI**.
Catatonic mode
---------------
+==============
As mentioned, an autofs mount can enter "catatonic" mode. This
happens if a write to the notification pipe fails, or if it is
@@ -527,7 +548,7 @@ Catatonic mode can only be left via the
**AUTOFS_DEV_IOCTL_OPENMOUNT_CMD** ioctl on the `/dev/autofs`.
The "ignore" mount option
--------------------------
+=========================
The "ignore" mount option can be used to provide a generic indicator
to applications that the mount entry should be ignored when displaying
@@ -542,18 +563,18 @@ This is intended to be used by user space programs to exclude autofs
mounts from consideration when reading the mounts list.
autofs, name spaces, and shared mounts
---------------------------------------
+======================================
With bind mounts and name spaces it is possible for an autofs
filesystem to appear at multiple places in one or more filesystem
name spaces. For this to work sensibly, the autofs filesystem should
-always be mounted "shared". e.g.
+always be mounted "shared". e.g. ::
-> `mount --make-shared /autofs/mount/point`
+ mount --make-shared /autofs/mount/point
The automount daemon is only able to manage a single mount location for
an autofs filesystem and if mounts on that are not 'shared', other
locations will not behave as expected. In particular access to those
-other locations will likely result in the `ELOOP` error
+other locations will likely result in the `ELOOP` error ::
-> Too many levels of symbolic links
+ Too many levels of symbolic links
diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt
index 9e27c843d00e..dc497b96fa4f 100644
--- a/Documentation/filesystems/debugfs.txt
+++ b/Documentation/filesystems/debugfs.txt
@@ -68,41 +68,49 @@ actually necessary; the debugfs code provides a number of helper functions
for simple situations. Files containing a single integer value can be
created with any of:
- struct dentry *debugfs_create_u8(const char *name, umode_t mode,
- struct dentry *parent, u8 *value);
- struct dentry *debugfs_create_u16(const char *name, umode_t mode,
- struct dentry *parent, u16 *value);
+ void debugfs_create_u8(const char *name, umode_t mode,
+ struct dentry *parent, u8 *value);
+ void debugfs_create_u16(const char *name, umode_t mode,
+ struct dentry *parent, u16 *value);
struct dentry *debugfs_create_u32(const char *name, umode_t mode,
struct dentry *parent, u32 *value);
- struct dentry *debugfs_create_u64(const char *name, umode_t mode,
- struct dentry *parent, u64 *value);
+ void debugfs_create_u64(const char *name, umode_t mode,
+ struct dentry *parent, u64 *value);
These files support both reading and writing the given value; if a specific
file should not be written to, simply set the mode bits accordingly. The
values in these files are in decimal; if hexadecimal is more appropriate,
the following functions can be used instead:
- struct dentry *debugfs_create_x8(const char *name, umode_t mode,
- struct dentry *parent, u8 *value);
- struct dentry *debugfs_create_x16(const char *name, umode_t mode,
- struct dentry *parent, u16 *value);
- struct dentry *debugfs_create_x32(const char *name, umode_t mode,
- struct dentry *parent, u32 *value);
- struct dentry *debugfs_create_x64(const char *name, umode_t mode,
- struct dentry *parent, u64 *value);
+ void debugfs_create_x8(const char *name, umode_t mode,
+ struct dentry *parent, u8 *value);
+ void debugfs_create_x16(const char *name, umode_t mode,
+ struct dentry *parent, u16 *value);
+ void debugfs_create_x32(const char *name, umode_t mode,
+ struct dentry *parent, u32 *value);
+ void debugfs_create_x64(const char *name, umode_t mode,
+ struct dentry *parent, u64 *value);
These functions are useful as long as the developer knows the size of the
value to be exported. Some types can have different widths on different
-architectures, though, complicating the situation somewhat. There is a
-function meant to help out in one special case:
+architectures, though, complicating the situation somewhat. There are
+functions meant to help out in such special cases:
- struct dentry *debugfs_create_size_t(const char *name, umode_t mode,
- struct dentry *parent,
- size_t *value);
+ void debugfs_create_size_t(const char *name, umode_t mode,
+ struct dentry *parent, size_t *value);
As might be expected, this function will create a debugfs file to represent
a variable of type size_t.
+Similarly, there are helpers for variables of type unsigned long, in decimal
+and hexadecimal:
+
+ struct dentry *debugfs_create_ulong(const char *name, umode_t mode,
+ struct dentry *parent,
+ unsigned long *value);
+ void debugfs_create_xul(const char *name, umode_t mode,
+ struct dentry *parent, unsigned long *value);
+
Boolean values can be placed in debugfs with:
struct dentry *debugfs_create_bool(const char *name, umode_t mode,
@@ -114,8 +122,8 @@ lower-case values, or 1 or 0. Any other input will be silently ignored.
Also, atomic_t values can be placed in debugfs with:
- struct dentry *debugfs_create_atomic_t(const char *name, umode_t mode,
- struct dentry *parent, atomic_t *value)
+ void debugfs_create_atomic_t(const char *name, umode_t mode,
+ struct dentry *parent, atomic_t *value)
A read of this file will get atomic_t values, and a write of this file
will set atomic_t values.
diff --git a/Documentation/filesystems/erofs.txt b/Documentation/filesystems/erofs.txt
index b0c085326e2e..db6d39c3ae71 100644
--- a/Documentation/filesystems/erofs.txt
+++ b/Documentation/filesystems/erofs.txt
@@ -24,11 +24,11 @@ Here is the main features of EROFS:
- Metadata & data could be mixed by design;
- 2 inode versions for different requirements:
- v1 v2
+ compact (v1) extended (v2)
Inode metadata size: 32 bytes 64 bytes
Max file size: 4 GB 16 EB (also limited by max. vol size)
Max uids/gids: 65536 4294967296
- File creation time: no yes (64 + 32-bit timestamp)
+ File change time: no yes (64 + 32-bit timestamp)
Max hardlinks: 65536 4294967296
Metadata reserved: 4 bytes 14 bytes
@@ -39,7 +39,7 @@ Here is the main features of EROFS:
- Support POSIX.1e ACLs by using xattrs;
- Support transparent file compression as an option:
- LZ4 algorithm with 4 KB fixed-output compression for high performance;
+ LZ4 algorithm with 4 KB fixed-sized output compression for high performance.
The following git tree provides the file system user-space tools under
development (ex, formatting tool mkfs.erofs):
@@ -85,7 +85,7 @@ All data areas should be aligned with the block size, but metadata areas
may not. All metadatas can be now observed in two different spaces (views):
1. Inode metadata space
Each valid inode should be aligned with an inode slot, which is a fixed
- value (32 bytes) and designed to be kept in line with v1 inode size.
+ value (32 bytes) and designed to be kept in line with compact inode size.
Each inode can be directly found with the following formula:
inode offset = meta_blkaddr * block_size + 32 * nid
@@ -117,10 +117,10 @@ may not. All metadatas can be now observed in two different spaces (views):
|-> aligned with 4B
Inode could be 32 or 64 bytes, which can be distinguished from a common
- field which all inode versions have -- i_advise:
+ field which all inode versions have -- i_format:
__________________ __________________
- | i_advise | | i_advise |
+ | i_format | | i_format |
|__________________| |__________________|
| ... | | ... |
| | | |
@@ -129,12 +129,13 @@ may not. All metadatas can be now observed in two different spaces (views):
|__________________| 64 bytes
Xattrs, extents, data inline are followed by the corresponding inode with
- proper alignes, and they could be optional for different data mappings,
- _currently_ there are totally 3 valid data mappings supported:
+ proper alignment, and they could be optional for different data mappings.
+ _currently_ total 4 valid data mappings are supported:
- 1) flat file data without data inline (no extent);
- 2) fixed-output size data compression (must have extents);
- 3) flat file data with tail-end data inline (no extent);
+ 0 flat file data without data inline (no extent);
+ 1 fixed-sized output data compression (with non-compacted indexes);
+ 2 flat file data with tail packing data inline (no extent);
+ 3 fixed-sized output data compression (with compacted indexes, v5.3+).
The size of the optional xattrs is indicated by i_xattr_count in inode
header. Large xattrs or xattrs shared by many different files can be
@@ -182,8 +183,8 @@ introduce another on-disk field at all.
Compression
-----------
-Currently, EROFS supports 4KB fixed-output clustersize transparent file
-compression, as illustrated below:
+Currently, EROFS supports 4KB fixed-sized output transparent file compression,
+as illustrated below:
|---- Variant-Length Extent ----|-------- VLE --------|----- VLE -----
clusterofs clusterofs clusterofs
diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt
index 7e1991328473..3135b80df6da 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -297,6 +297,9 @@ Files in /sys/fs/f2fs/<devname>
reclaim the prefree segments to free segments.
By default, 5% over total # of segments.
+ main_blkaddr This value gives the first block address of
+ MAIN area in the partition.
+
max_small_discards This parameter controls the number of discard
commands that consist small blocks less than 2MB.
The candidates to be discarded are cached until
@@ -346,7 +349,7 @@ Files in /sys/fs/f2fs/<devname>
ram_thresh This parameter controls the memory footprint used
by free nids and cached nat entries. By default,
- 10 is set, which indicates 10 MB / 1 GB RAM.
+ 1 is set, which indicates 10 MB / 1 GB RAM.
ra_nid_pages When building free nids, F2FS reads NAT blocks
ahead for speed up. Default is 0.
diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
index 8a0700af9596..68c2bc8275cf 100644
--- a/Documentation/filesystems/fscrypt.rst
+++ b/Documentation/filesystems/fscrypt.rst
@@ -256,13 +256,8 @@ alternative master keys or to support rotating master keys. Instead,
the master keys may be wrapped in userspace, e.g. as is done by the
`fscrypt <https://github.com/google/fscrypt>`_ tool.
-Including the inode number in the IVs was considered. However, it was
-rejected as it would have prevented ext4 filesystems from being
-resized, and by itself still wouldn't have been sufficient to prevent
-the same key from being directly reused for both XTS and CTS-CBC.
-
-DIRECT_KEY and per-mode keys
-----------------------------
+DIRECT_KEY policies
+-------------------
The Adiantum encryption mode (see `Encryption modes and usage`_) is
suitable for both contents and filenames encryption, and it accepts
@@ -285,6 +280,21 @@ IV. Moreover:
key derived using the KDF. Users may use the same master key for
other v2 encryption policies.
+IV_INO_LBLK_64 policies
+-----------------------
+
+When FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64 is set in the fscrypt policy,
+the encryption keys are derived from the master key, encryption mode
+number, and filesystem UUID. This normally results in all files
+protected by the same master key sharing a single contents encryption
+key and a single filenames encryption key. To still encrypt different
+files' data differently, inode numbers are included in the IVs.
+Consequently, shrinking the filesystem may not be allowed.
+
+This format is optimized for use with inline encryption hardware
+compliant with the UFS or eMMC standards, which support only 64 IV
+bits per I/O request and may have only a small number of keyslots.
+
Key identifiers
---------------
@@ -308,8 +318,9 @@ If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.
AES-128-CBC was added only for low-powered embedded devices with
crypto accelerators such as CAAM or CESA that do not support XTS. To
-use AES-128-CBC, CONFIG_CRYPTO_SHA256 (or another SHA-256
-implementation) must be enabled so that ESSIV can be used.
+use AES-128-CBC, CONFIG_CRYPTO_ESSIV and CONFIG_CRYPTO_SHA256 (or
+another SHA-256 implementation) must be enabled so that ESSIV can be
+used.
Adiantum is a (primarily) stream cipher-based mode that is fast even
on CPUs without dedicated crypto instructions. It's also a true
@@ -331,8 +342,8 @@ Contents encryption
-------------------
For file contents, each filesystem block is encrypted independently.
-Currently, only the case where the filesystem block size is equal to
-the system's page size (usually 4096 bytes) is supported.
+Starting from Linux kernel 5.5, encryption of filesystems with block
+size less than system's page size is supported.
Each block's IV is set to the logical block number within the file as
a little endian number, except that:
@@ -341,10 +352,16 @@ a little endian number, except that:
is encrypted with AES-256 where the AES-256 key is the SHA-256 hash
of the file's data encryption key.
-- In the "direct key" configuration (FSCRYPT_POLICY_FLAG_DIRECT_KEY
- set in the fscrypt_policy), the file's nonce is also appended to the
- IV. Currently this is only allowed with the Adiantum encryption
- mode.
+- With `DIRECT_KEY policies`_, the file's nonce is appended to the IV.
+ Currently this is only allowed with the Adiantum encryption mode.
+
+- With `IV_INO_LBLK_64 policies`_, the logical block number is limited
+ to 32 bits and is placed in bits 0-31 of the IV. The inode number
+ (which is also limited to 32 bits) is placed in bits 32-63.
+
+Note that because file logical block numbers are included in the IVs,
+filesystems must enforce that blocks are never shifted around within
+encrypted files, e.g. via "collapse range" or "insert range".
Filenames encryption
--------------------
@@ -354,10 +371,10 @@ the requirements to retain support for efficient directory lookups and
filenames of up to 255 bytes, the same IV is used for every filename
in a directory.
-However, each encrypted directory still uses a unique key; or
-alternatively (for the "direct key" configuration) has the file's
-nonce included in the IVs. Thus, IV reuse is limited to within a
-single directory.
+However, each encrypted directory still uses a unique key, or
+alternatively has the file's nonce (for `DIRECT_KEY policies`_) or
+inode number (for `IV_INO_LBLK_64 policies`_) included in the IVs.
+Thus, IV reuse is limited to within a single directory.
With CTS-CBC, the IV reuse means that when the plaintext filenames
share a common prefix at least as long as the cipher block size (16
@@ -431,12 +448,15 @@ This structure must be initialized as follows:
(1) for ``contents_encryption_mode`` and FSCRYPT_MODE_AES_256_CTS
(4) for ``filenames_encryption_mode``.
-- ``flags`` must contain a value from ``<linux/fscrypt.h>`` which
- identifies the amount of NUL-padding to use when encrypting
- filenames. If unsure, use FSCRYPT_POLICY_FLAGS_PAD_32 (0x3).
- Additionally, if the encryption modes are both
- FSCRYPT_MODE_ADIANTUM, this can contain
- FSCRYPT_POLICY_FLAG_DIRECT_KEY; see `DIRECT_KEY and per-mode keys`_.
+- ``flags`` contains optional flags from ``<linux/fscrypt.h>``:
+
+ - FSCRYPT_POLICY_FLAGS_PAD_*: The amount of NUL padding to use when
+ encrypting filenames. If unsure, use FSCRYPT_POLICY_FLAGS_PAD_32
+ (0x3).
+ - FSCRYPT_POLICY_FLAG_DIRECT_KEY: See `DIRECT_KEY policies`_.
+ - FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64: See `IV_INO_LBLK_64
+ policies`_. This is mutually exclusive with DIRECT_KEY and is not
+ supported on v1 policies.
- For v2 encryption policies, ``__reserved`` must be zeroed.
@@ -1089,7 +1109,7 @@ policy structs (see `Setting an encryption policy`_), except that the
context structs also contain a nonce. The nonce is randomly generated
by the kernel and is used as KDF input or as a tweak to cause
different files to be encrypted differently; see `Per-file keys`_ and
-`DIRECT_KEY and per-mode keys`_.
+`DIRECT_KEY policies`_.
Data path changes
-----------------
diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
index 42a0b6dd9e0b..a95536b6443c 100644
--- a/Documentation/filesystems/fsverity.rst
+++ b/Documentation/filesystems/fsverity.rst
@@ -226,6 +226,14 @@ To do so, check for FS_VERITY_FL (0x00100000) in the returned flags.
The verity flag is not settable via FS_IOC_SETFLAGS. You must use
FS_IOC_ENABLE_VERITY instead, since parameters must be provided.
+statx
+-----
+
+Since Linux v5.5, the statx() system call sets STATX_ATTR_VERITY if
+the file has fs-verity enabled. This can perform better than
+FS_IOC_GETFLAGS and FS_IOC_MEASURE_VERITY because it doesn't require
+opening the file, and opening verity files can be expensive.
+
Accessing verity files
======================
@@ -398,7 +406,7 @@ pages have been read into the pagecache. (See `Verifying data`_.)
ext4
----
-ext4 supports fs-verity since Linux TODO and e2fsprogs v1.45.2.
+ext4 supports fs-verity since Linux v5.4 and e2fsprogs v1.45.2.
To create verity files on an ext4 filesystem, the filesystem must have
been formatted with ``-O verity`` or had ``tune2fs -O verity`` run on
@@ -434,7 +442,7 @@ also only supports extent-based files.
f2fs
----
-f2fs supports fs-verity since Linux TODO and f2fs-tools v1.11.0.
+f2fs supports fs-verity since Linux v5.4 and f2fs-tools v1.11.0.
To create verity files on an f2fs filesystem, the filesystem must have
been formatted with ``-O verity``.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 2c3a9f761205..ad6315a48d14 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -46,4 +46,5 @@ Documentation for filesystem implementations.
.. toctree::
:maxdepth: 2
+ autofs
virtiofs
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index fc3a0704553c..5057e4d9dcd1 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -105,7 +105,7 @@ getattr: no
listxattr: no
fiemap: no
update_time: no
-atomic_open: exclusive
+atomic_open: shared (exclusive if O_CREAT is set in open flags)
tmpfile: no
============ =============================================
diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.rst
index 845d689e0fd7..e443be7928db 100644
--- a/Documentation/filesystems/overlayfs.txt
+++ b/Documentation/filesystems/overlayfs.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
Written by: Neil Brown
Please see MAINTAINERS file for where to send questions.
@@ -181,7 +183,7 @@ Kernel config options:
worried about backward compatibility with kernels that have the redirect_dir
feature and follow redirects even if turned off.
-Module options (can also be changed through /sys/module/overlay/parameters/*):
+Module options (can also be changed through /sys/module/overlay/parameters/):
- "redirect_dir=BOOL":
See OVERLAY_FS_REDIRECT_DIR kernel config option above.
@@ -263,7 +265,7 @@ top, lower2 the middle and lower3 the bottom layer.
Metadata only copy up
---------------------
+---------------------
When metadata only copy up feature is enabled, overlayfs will only copy
up metadata (as opposed to whole file), when a metadata specific operation
@@ -286,10 +288,10 @@ pointed by REDIRECT. This should not be possible on local system as setting
"trusted." xattrs will require CAP_SYS_ADMIN. But it should be possible
for untrusted layers like from a pen drive.
-Note: redirect_dir={off|nofollow|follow(*)} conflicts with metacopy=on, and
+Note: redirect_dir={off|nofollow|follow[*]} conflicts with metacopy=on, and
results in an error.
-(*) redirect_dir=follow only conflicts with metacopy=on if upperdir=... is
+[*] redirect_dir=follow only conflicts with metacopy=on if upperdir=... is
given.
Sharing and copying layers