summaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/caching/backend-api.rst2
-rw-r--r--Documentation/filesystems/ceph.rst1
-rw-r--r--Documentation/filesystems/cifs/ksmbd.rst42
-rw-r--r--Documentation/filesystems/configfs.rst48
-rw-r--r--Documentation/filesystems/debugfs.rst8
-rw-r--r--Documentation/filesystems/erofs.rst38
-rw-r--r--Documentation/filesystems/ext4/super.rst6
-rw-r--r--Documentation/filesystems/f2fs.rst18
-rw-r--r--Documentation/filesystems/fscrypt.rst7
-rw-r--r--Documentation/filesystems/idmappings.rst2
-rw-r--r--Documentation/filesystems/locking.rst13
-rw-r--r--Documentation/filesystems/mount_api.rst12
-rw-r--r--Documentation/filesystems/porting.rst25
-rw-r--r--Documentation/filesystems/proc.rst32
-rw-r--r--Documentation/filesystems/qnx6.rst2
-rw-r--r--Documentation/filesystems/spufs/spufs.rst2
-rw-r--r--Documentation/filesystems/sysfs.rst43
-rw-r--r--Documentation/filesystems/ubifs.rst2
-rw-r--r--Documentation/filesystems/vfs.rst14
-rw-r--r--Documentation/filesystems/xfs-delayed-logging-design.rst18
20 files changed, 192 insertions, 143 deletions
diff --git a/Documentation/filesystems/caching/backend-api.rst b/Documentation/filesystems/caching/backend-api.rst
index d7507becf674..3a199fc50828 100644
--- a/Documentation/filesystems/caching/backend-api.rst
+++ b/Documentation/filesystems/caching/backend-api.rst
@@ -122,7 +122,7 @@ volumes, calling::
to tell fscache that a volume has been withdrawn. This waits for all
outstanding accesses on the volume to complete before returning.
-When the the cache is completely withdrawn, fscache should be notified by
+When the cache is completely withdrawn, fscache should be notified by
calling::
void fscache_relinquish_cache(struct fscache_cache *cache);
diff --git a/Documentation/filesystems/ceph.rst b/Documentation/filesystems/ceph.rst
index 4942e018db85..76ce938e7024 100644
--- a/Documentation/filesystems/ceph.rst
+++ b/Documentation/filesystems/ceph.rst
@@ -203,7 +203,6 @@ For more information on Ceph, see the home page at
The Linux kernel client source tree is available at
- https://github.com/ceph/ceph-client.git
- - git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
and the source for the full system is at
https://github.com/ceph/ceph.git
diff --git a/Documentation/filesystems/cifs/ksmbd.rst b/Documentation/filesystems/cifs/ksmbd.rst
index 1af600db2e70..7bed96d794fc 100644
--- a/Documentation/filesystems/cifs/ksmbd.rst
+++ b/Documentation/filesystems/cifs/ksmbd.rst
@@ -118,26 +118,44 @@ ksmbd/nfsd interoperability Planned for future. The features that ksmbd
How to run
==========
-1. Download ksmbd-tools and compile them.
- - https://github.com/cifsd-team/ksmbd-tools
+1. Download ksmbd-tools(https://github.com/cifsd-team/ksmbd-tools/releases) and
+ compile them.
-2. Create user/password for SMB share.
+ - Refer README(https://github.com/cifsd-team/ksmbd-tools/blob/master/README.md)
+ to know how to use ksmbd.mountd/adduser/addshare/control utils
- # mkdir /etc/ksmbd/
- # ksmbd.adduser -a <Enter USERNAME for SMB share access>
+ $ ./autogen.sh
+ $ ./configure --with-rundir=/run
+ $ make && sudo make install
-3. Create /etc/ksmbd/smb.conf file, add SMB share in smb.conf file
- - Refer smb.conf.example and
- https://github.com/cifsd-team/ksmbd-tools/blob/master/Documentation/configuration.txt
+2. Create /usr/local/etc/ksmbd/ksmbd.conf file, add SMB share in ksmbd.conf file.
-4. Insert ksmbd.ko module
+ - Refer ksmbd.conf.example in ksmbd-utils, See ksmbd.conf manpage
+ for details to configure shares.
- # insmod ksmbd.ko
+ $ man ksmbd.conf
+
+3. Create user/password for SMB share.
+
+ - See ksmbd.adduser manpage.
+
+ $ man ksmbd.adduser
+ $ sudo ksmbd.adduser -a <Enter USERNAME for SMB share access>
+
+4. Insert ksmbd.ko module after build your kernel. No need to load module
+ if ksmbd is built into the kernel.
+
+ - Set ksmbd in menuconfig(e.g. $ make menuconfig)
+ [*] Network File Systems --->
+ <M> SMB3 server support (EXPERIMENTAL)
+
+ $ sudo modprobe ksmbd.ko
5. Start ksmbd user space daemon
- # ksmbd.mountd
-6. Access share from Windows or Linux using CIFS
+ $ sudo ksmbd.mountd
+
+6. Access share from Windows or Linux using SMB3 client (cifs.ko or smbclient of samba)
Shutdown KSMBD
==============
diff --git a/Documentation/filesystems/configfs.rst b/Documentation/filesystems/configfs.rst
index 1d3d6f4a82a9..8c9342ed6d25 100644
--- a/Documentation/filesystems/configfs.rst
+++ b/Documentation/filesystems/configfs.rst
@@ -289,7 +289,6 @@ config_item_type::
const char *name);
struct config_group *(*make_group)(struct config_group *group,
const char *name);
- int (*commit_item)(struct config_item *item);
void (*disconnect_notify)(struct config_group *group,
struct config_item *item);
void (*drop_item)(struct config_group *group,
@@ -486,50 +485,3 @@ up. Here, the heartbeat code calls configfs_depend_item(). If it
succeeds, then heartbeat knows the region is safe to give to ocfs2.
If it fails, it was being torn down anyway, and heartbeat can gracefully
pass up an error.
-
-Committable Items
-=================
-
-Note:
- Committable items are currently unimplemented.
-
-Some config_items cannot have a valid initial state. That is, no
-default values can be specified for the item's attributes such that the
-item can do its work. Userspace must configure one or more attributes,
-after which the subsystem can start whatever entity this item
-represents.
-
-Consider the FakeNBD device from above. Without a target address *and*
-a target device, the subsystem has no idea what block device to import.
-The simple example assumes that the subsystem merely waits until all the
-appropriate attributes are configured, and then connects. This will,
-indeed, work, but now every attribute store must check if the attributes
-are initialized. Every attribute store must fire off the connection if
-that condition is met.
-
-Far better would be an explicit action notifying the subsystem that the
-config_item is ready to go. More importantly, an explicit action allows
-the subsystem to provide feedback as to whether the attributes are
-initialized in a way that makes sense. configfs provides this as
-committable items.
-
-configfs still uses only normal filesystem operations. An item is
-committed via rename(2). The item is moved from a directory where it
-can be modified to a directory where it cannot.
-
-Any group that provides the ct_group_ops->commit_item() method has
-committable items. When this group appears in configfs, mkdir(2) will
-not work directly in the group. Instead, the group will have two
-subdirectories: "live" and "pending". The "live" directory does not
-support mkdir(2) or rmdir(2) either. It only allows rename(2). The
-"pending" directory does allow mkdir(2) and rmdir(2). An item is
-created in the "pending" directory. Its attributes can be modified at
-will. Userspace commits the item by renaming it into the "live"
-directory. At this point, the subsystem receives the ->commit_item()
-callback. If all required attributes are filled to satisfaction, the
-method returns zero and the item is moved to the "live" directory.
-
-As rmdir(2) does not work in the "live" directory, an item must be
-shutdown, or "uncommitted". Again, this is done via rename(2), this
-time from the "live" directory back to the "pending" one. The subsystem
-is notified by the ct_group_ops->uncommit_object() method.
diff --git a/Documentation/filesystems/debugfs.rst b/Documentation/filesystems/debugfs.rst
index 71b1fee56d2a..dc35da8b8792 100644
--- a/Documentation/filesystems/debugfs.rst
+++ b/Documentation/filesystems/debugfs.rst
@@ -155,8 +155,8 @@ any code which does so in the mainline. Note that all files created with
debugfs_create_blob() are read-only.
If you want to dump a block of registers (something that happens quite
-often during development, even if little such code reaches mainline.
-Debugfs offers two functions: one to make a registers-only file, and
+often during development, even if little such code reaches mainline),
+debugfs offers two functions: one to make a registers-only file, and
another to insert a register block in the middle of another sequential
file::
@@ -183,7 +183,7 @@ The "base" argument may be 0, but you may want to build the reg32 array
using __stringify, and a number of register names (macros) are actually
byte offsets over a base for the register block.
-If you want to dump an u32 array in debugfs, you can create file with::
+If you want to dump a u32 array in debugfs, you can create a file with::
struct debugfs_u32_array {
u32 *array;
@@ -197,7 +197,7 @@ If you want to dump an u32 array in debugfs, you can create file with::
The "array" argument wraps a pointer to the array's data and the number
of its elements. Note: Once array is created its size can not be changed.
-There is a helper function to create device related seq_file::
+There is a helper function to create a device-related seq_file::
void debugfs_create_devm_seqfile(struct device *dev,
const char *name,
diff --git a/Documentation/filesystems/erofs.rst b/Documentation/filesystems/erofs.rst
index 05e03d54af1a..067fd1670b1f 100644
--- a/Documentation/filesystems/erofs.rst
+++ b/Documentation/filesystems/erofs.rst
@@ -30,12 +30,18 @@ It is implemented to be a better choice for the following scenarios:
especially for those embedded devices with limited memory and high-density
hosts with numerous containers.
-Here is the main features of EROFS:
+Here are the main features of EROFS:
- Little endian on-disk design;
- - 4KiB block size and 32-bit block addresses, therefore 16TiB address space
- at most for now;
+ - Block-based distribution and file-based distribution over fscache are
+ supported;
+
+ - Support multiple devices to refer to external blobs, which can be used
+ for container images;
+
+ - 4KiB block size and 32-bit block addresses for each device, therefore
+ 16TiB address space at most for now;
- Two inode layouts for different requirements:
@@ -50,28 +56,31 @@ Here is the main features of EROFS:
Metadata reserved 8 bytes 18 bytes
===================== ============ ======================================
- - Metadata and data could be mixed as an option;
-
- - Support extended attributes (xattrs) as an option;
+ - Support extended attributes as an option;
- - Support tailpacking data and xattr inline compared to byte-addressed
- unaligned metadata or smaller block size alternatives;
-
- - Support POSIX.1e ACLs by using xattrs;
+ - Support POSIX.1e ACLs by using extended attributes;
- Support transparent data compression as an option:
LZ4 and MicroLZMA algorithms can be used on a per-file basis; In addition,
inplace decompression is also supported to avoid bounce compressed buffers
and page cache thrashing.
+ - Support chunk-based data deduplication and rolling-hash compressed data
+ deduplication;
+
+ - Support tailpacking inline compared to byte-addressed unaligned metadata
+ or smaller block size alternatives;
+
+ - Support merging tail-end data into a special inode as fragments.
+
+ - Support large folios for uncompressed files.
+
- Support direct I/O on uncompressed files to avoid double caching for loop
devices;
- Support FSDAX on uncompressed images for secure containers and ramdisks in
order to get rid of unnecessary page cache.
- - Support multiple devices for multi blob container images;
-
- Support file-based on-demand loading with the Fscache infrastructure.
The following git tree provides the file system user-space tools under
@@ -259,7 +268,7 @@ By the way, chunk-based files are all uncompressed for now.
Data compression
----------------
-EROFS implements LZ4 fixed-sized output compression which generates fixed-sized
+EROFS implements fixed-sized output compression which generates fixed-sized
compressed data blocks from variable-sized input in contrast to other existing
fixed-sized input solutions. Relatively higher compression ratios can be gotten
by using fixed-sized output compression since nowadays popular data compression
@@ -314,3 +323,6 @@ to understand its delta0 is constantly 1, as illustrated below::
If another HEAD follows a HEAD lcluster, there is no room to record CBLKCNT,
but it's easy to know the size of such pcluster is 1 lcluster as well.
+
+Since Linux v6.1, each pcluster can be used for multiple variable-sized extents,
+therefore it can be used for compressed data deduplication.
diff --git a/Documentation/filesystems/ext4/super.rst b/Documentation/filesystems/ext4/super.rst
index 268888522e35..0152888cac29 100644
--- a/Documentation/filesystems/ext4/super.rst
+++ b/Documentation/filesystems/ext4/super.rst
@@ -456,15 +456,15 @@ The ext4 superblock is laid out as follows in
* - 0x277
- __u8
- s_lastcheck_hi
- - Upper 8 bits of the s_lastcheck_hi field.
+ - Upper 8 bits of the s_lastcheck field.
* - 0x278
- __u8
- s_first_error_time_hi
- - Upper 8 bits of the s_first_error_time_hi field.
+ - Upper 8 bits of the s_first_error_time field.
* - 0x279
- __u8
- s_last_error_time_hi
- - Upper 8 bits of the s_last_error_time_hi field.
+ - Upper 8 bits of the s_last_error_time field.
* - 0x27A
- __u8
- s_pad[2]
diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst
index d0c09663dae8..220f3e0d3f55 100644
--- a/Documentation/filesystems/f2fs.rst
+++ b/Documentation/filesystems/f2fs.rst
@@ -25,10 +25,14 @@ a consistency checking tool (fsck.f2fs), and a debugging tool (dump.f2fs).
- git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git
-For reporting bugs and sending patches, please use the following mailing list:
+For sending patches, please use the following mailing list:
- linux-f2fs-devel@lists.sourceforge.net
+For reporting bugs, please use the following f2fs bug tracker link:
+
+- https://bugzilla.kernel.org/enter_bug.cgi?product=File%20System&component=f2fs
+
Background and Design issues
============================
@@ -154,6 +158,8 @@ nobarrier This option can be used if underlying storage guarantees
If this option is set, no cache_flush commands are issued
but f2fs still guarantees the write ordering of all the
data writes.
+barrier If this option is set, cache_flush commands are allowed to be
+ issued.
fastboot This option is used when a system wants to reduce mount
time as much as possible, even though normal performance
can be sacrificed.
@@ -199,6 +205,7 @@ fault_type=%d Support configuring fault injection type, should be
FAULT_SLAB_ALLOC 0x000008000
FAULT_DQUOT_INIT 0x000010000
FAULT_LOCK_OP 0x000020000
+ FAULT_BLKADDR 0x000040000
=================== ===========
mode=%s Control block allocation mode which supports "adaptive"
and "lfs". In "lfs" mode, there should be no random
@@ -286,9 +293,8 @@ compress_algorithm=%s:%d Control compress algorithm and its compress level, now,
algorithm level range
lz4 3 - 16
zstd 1 - 22
-compress_log_size=%u Support configuring compress cluster size, the size will
- be 4KB * (1 << %u), 16KB is minimum size, also it's
- default size.
+compress_log_size=%u Support configuring compress cluster size. The size will
+ be 4KB * (1 << %u). The default and minimum sizes are 16KB.
compress_extension=%s Support adding specified extension, so that f2fs can enable
compression on those corresponding files, e.g. if all files
with '.ext' has high compression rate, we can set the '.ext'
@@ -341,6 +347,10 @@ memory=%s Control memory mode. This supports "normal" and "low" modes.
Because of the nature of low memory devices, in this mode, f2fs
will try to save memory sometimes by sacrificing performance.
"normal" mode is the default mode and same as before.
+age_extent_cache Enable an age extent cache based on rb-tree. It records
+ data block update frequency of the extent per inode, in
+ order to provide better temperature hints for data block
+ allocation.
======================== ============================================================
Debugfs Entries
diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
index 5ba5817c17c2..ef183387da20 100644
--- a/Documentation/filesystems/fscrypt.rst
+++ b/Documentation/filesystems/fscrypt.rst
@@ -338,6 +338,7 @@ Currently, the following pairs of encryption modes are supported:
- AES-128-CBC for contents and AES-128-CTS-CBC for filenames
- Adiantum for both contents and filenames
- AES-256-XTS for contents and AES-256-HCTR2 for filenames (v2 policies only)
+- SM4-XTS for contents and SM4-CTS-CBC for filenames (v2 policies only)
If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.
@@ -369,6 +370,12 @@ CONFIG_CRYPTO_HCTR2 must be enabled. Also, fast implementations of XCTR and
POLYVAL should be enabled, e.g. CRYPTO_POLYVAL_ARM64_CE and
CRYPTO_AES_ARM64_CE_BLK for ARM64.
+SM4 is a Chinese block cipher that is an alternative to AES. It has
+not seen as much security review as AES, and it only has a 128-bit key
+size. It may be useful in cases where its use is mandated.
+Otherwise, it should not be used. For SM4 support to be available, it
+also needs to be enabled in the kernel crypto API.
+
New encryption modes can be added relatively easily, without changes
to individual filesystems. However, authenticated encryption (AE)
modes are not currently supported because of the difficulty of dealing
diff --git a/Documentation/filesystems/idmappings.rst b/Documentation/filesystems/idmappings.rst
index c1db8748389c..b9b31066aef2 100644
--- a/Documentation/filesystems/idmappings.rst
+++ b/Documentation/filesystems/idmappings.rst
@@ -661,7 +661,7 @@ idmappings::
mount idmapping: u0:k10000:r10000
Assume a file owned by ``u1000`` is read from disk. The filesystem maps this id
-to ``k21000`` according to it's idmapping. This is what is stored in the
+to ``k21000`` according to its idmapping. This is what is stored in the
inode's ``i_uid`` and ``i_gid`` fields.
When the caller queries the ownership of this file via ``stat()`` the kernel
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 4bb2627026ec..36fa2a83d714 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -70,7 +70,7 @@ prototypes::
const char *(*get_link) (struct dentry *, struct inode *, struct delayed_call *);
void (*truncate) (struct inode *);
int (*permission) (struct inode *, int, unsigned int);
- struct posix_acl * (*get_acl)(struct inode *, int, bool);
+ struct posix_acl * (*get_inode_acl)(struct inode *, int, bool);
int (*setattr) (struct dentry *, struct iattr *);
int (*getattr) (const struct path *, struct kstat *, u32, unsigned int);
ssize_t (*listxattr) (struct dentry *, char *, size_t);
@@ -79,17 +79,19 @@ prototypes::
int (*atomic_open)(struct inode *, struct dentry *,
struct file *, unsigned open_flag,
umode_t create_mode);
- int (*tmpfile) (struct inode *, struct dentry *, umode_t);
+ int (*tmpfile) (struct user_namespace *, struct inode *,
+ struct file *, umode_t);
int (*fileattr_set)(struct user_namespace *mnt_userns,
struct dentry *dentry, struct fileattr *fa);
int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
+ struct posix_acl * (*get_acl)(struct user_namespace *, struct dentry *, int);
locking rules:
all may block
-============= =============================================
+============== =============================================
ops i_rwsem(inode)
-============= =============================================
+============== =============================================
lookup: shared
create: exclusive
link: exclusive (both)
@@ -103,6 +105,7 @@ readlink: no
get_link: no
setattr: exclusive
permission: no (may not block if called in rcu-walk mode)
+get_inode_acl: no
get_acl: no
getattr: no
listxattr: no
@@ -112,7 +115,7 @@ atomic_open: shared (exclusive if O_CREAT is set in open flags)
tmpfile: no
fileattr_get: no or exclusive
fileattr_set: exclusive
-============= =============================================
+============== =============================================
Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem
diff --git a/Documentation/filesystems/mount_api.rst b/Documentation/filesystems/mount_api.rst
index eb358a00be27..63204d2094fd 100644
--- a/Documentation/filesystems/mount_api.rst
+++ b/Documentation/filesystems/mount_api.rst
@@ -562,17 +562,6 @@ or looking up of superblocks.
The following helpers all wrap sget_fc():
- * ::
-
- int vfs_get_super(struct fs_context *fc,
- enum vfs_get_super_keying keying,
- int (*fill_super)(struct super_block *sb,
- struct fs_context *fc))
-
- This creates/looks up a deviceless superblock. The keying indicates how
- many superblocks of this type may exist and in what manner they may be
- shared:
-
(1) vfs_get_single_super
Only one such superblock may exist in the system. Any further
@@ -814,6 +803,7 @@ process the parameters it is given.
int fs_lookup_param(struct fs_context *fc,
struct fs_parameter *value,
bool want_bdev,
+ unsigned int flags,
struct path *_path);
This takes a parameter that carries a string or filename type and attempts
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index aee9aaf9f3df..d2d684ae7798 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -462,8 +462,8 @@ ERR_PTR(...).
argument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask.
generic_permission() has also lost the check_acl argument; ACL checking
-has been taken to VFS and filesystems need to provide a non-NULL ->i_op->get_acl
-to read an ACL from disk.
+has been taken to VFS and filesystems need to provide a non-NULL
+->i_op->get_inode_acl to read an ACL from disk.
---
@@ -922,3 +922,24 @@ is provided - file_open_root_mnt(). In-tree users adjusted.
no_llseek is gone; don't set .llseek to that - just leave it NULL instead.
Checks for "does that file have llseek(2), or should it fail with ESPIPE"
should be done by looking at FMODE_LSEEK in file->f_mode.
+
+---
+
+*mandatory*
+
+filldir_t (readdir callbacks) calling conventions have changed. Instead of
+returning 0 or -E... it returns bool now. false means "no more" (as -E... used
+to) and true - "keep going" (as 0 in old calling conventions). Rationale:
+callers never looked at specific -E... values anyway. ->iterate() and
+->iterate_shared() instance require no changes at all, all filldir_t ones in
+the tree converted.
+
+---
+
+**mandatory**
+
+Calling conventions for ->tmpfile() have changed. It now takes a struct
+file pointer instead of struct dentry pointer. d_tmpfile() is similarly
+changed to simplify callers. The passed file is in a non-open state and on
+success must be opened before returning (e.g. by calling
+finish_open_simple()).
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index e7aafc82be99..e224b6d5b642 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -47,6 +47,7 @@ fixes/update part 1.1 Stefani Seibold <stefani@seibold.net> June 9 2009
3.10 /proc/<pid>/timerslack_ns - Task timerslack value
3.11 /proc/<pid>/patch_state - Livepatch patch operation state
3.12 /proc/<pid>/arch_status - Task architecture specific information
+ 3.13 /proc/<pid>/fd - List of symlinks to open files
4 Configuring procfs
4.1 Mount options
@@ -245,7 +246,8 @@ It's slow but very precise.
Ngid NUMA group ID (0 if none)
Pid process id
PPid process id of the parent process
- TracerPid PID of process tracing this process (0 if not)
+ TracerPid PID of process tracing this process (0 if not, or
+ the tracer is outside of the current pid namespace)
Uid Real, effective, saved set, and file system UIDs
Gid Real, effective, saved set, and file system GIDs
FDSize number of file descriptor slots currently allocated
@@ -426,14 +428,16 @@ with the memory region, as the case would be with BSS (uninitialized data).
The "pathname" shows the name associated file for this mapping. If the mapping
is not associated with a file:
- ============= ====================================
+ =================== ===========================================
[heap] the heap of the program
[stack] the stack of the main process
[vdso] the "virtual dynamic shared object",
the kernel system call handler
- [anon:<name>] an anonymous mapping that has been
+ [anon:<name>] a private anonymous mapping that has been
named by userspace
- ============= ====================================
+ [anon_shmem:<name>] an anonymous shared memory mapping that has
+ been named by userspace
+ =================== ===========================================
or if empty, the mapping is anonymous.
@@ -982,6 +986,7 @@ Example output. You may not have all of these fields.
SUnreclaim: 142336 kB
KernelStack: 11168 kB
PageTables: 20540 kB
+ SecPageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
@@ -1090,6 +1095,9 @@ KernelStack
Memory consumed by the kernel stacks of all tasks
PageTables
Memory consumed by userspace page tables
+SecPageTables
+ Memory consumed by secondary page tables, this currently
+ currently includes KVM mmu allocations on x86 and arm64.
NFS_Unstable
Always zero. Previous counted pages which had been written to
the server, but has not been committed to stable storage.
@@ -2145,6 +2153,22 @@ AVX512_elapsed_ms
the task is unlikely an AVX512 user, but depends on the workload and the
scheduling scenario, it also could be a false negative mentioned above.
+3.13 /proc/<pid>/fd - List of symlinks to open files
+-------------------------------------------------------
+This directory contains symbolic links which represent open files
+the process is maintaining. Example output::
+
+ lr-x------ 1 root root 64 Sep 20 17:53 0 -> /dev/null
+ l-wx------ 1 root root 64 Sep 20 17:53 1 -> /dev/null
+ lrwx------ 1 root root 64 Sep 20 17:53 10 -> 'socket:[12539]'
+ lrwx------ 1 root root 64 Sep 20 17:53 11 -> 'socket:[12540]'
+ lrwx------ 1 root root 64 Sep 20 17:53 12 -> 'socket:[12542]'
+
+The number of open files for the process is stored in 'size' member
+of stat() output for /proc/<pid>/fd for fast access.
+-------------------------------------------------------
+
+
Chapter 4: Configuring procfs
=============================
diff --git a/Documentation/filesystems/qnx6.rst b/Documentation/filesystems/qnx6.rst
index fd13433d362c..523b798f04e7 100644
--- a/Documentation/filesystems/qnx6.rst
+++ b/Documentation/filesystems/qnx6.rst
@@ -176,7 +176,7 @@ Then userspace.
The requirement for a static, fixed preallocated system area comes from how
qnx6fs deals with writes.
-Each superblock got it's own half of the system area. So superblock #1
+Each superblock got its own half of the system area. So superblock #1
always uses blocks from the lower half while superblock #2 just writes to
blocks represented by the upper half bitmap system area bits.
diff --git a/Documentation/filesystems/spufs/spufs.rst b/Documentation/filesystems/spufs/spufs.rst
index 8a42859bb100..ca0441cbe37e 100644
--- a/Documentation/filesystems/spufs/spufs.rst
+++ b/Documentation/filesystems/spufs/spufs.rst
@@ -227,7 +227,7 @@ Files
from the data buffer, updating the value of the specified signal
notification register. The signal notification register will
either be replaced with the input data or will be updated to the
- bitwise OR or the old value and the input data, depending on the
+ bitwise OR of the old value and the input data, depending on the
contents of the signal1_type, or signal2_type respectively,
file.
diff --git a/Documentation/filesystems/sysfs.rst b/Documentation/filesystems/sysfs.rst
index 004d490179f3..f8187d466b97 100644
--- a/Documentation/filesystems/sysfs.rst
+++ b/Documentation/filesystems/sysfs.rst
@@ -12,10 +12,10 @@ Mike Murphy <mamurph@cs.clemson.edu>
:Original: 10 January 2003
-What it is:
-~~~~~~~~~~~
+What it is
+~~~~~~~~~~
-sysfs is a ram-based filesystem initially based on ramfs. It provides
+sysfs is a RAM-based filesystem initially based on ramfs. It provides
a means to export kernel data structures, their attributes, and the
linkages between them to userspace.
@@ -43,7 +43,7 @@ userspace. Top-level directories in sysfs represent the common
ancestors of object hierarchies; i.e. the subsystems the objects
belong to.
-Sysfs internally stores a pointer to the kobject that implements a
+sysfs internally stores a pointer to the kobject that implements a
directory in the kernfs_node object associated with the directory. In
the past this kobject pointer has been used by sysfs to do reference
counting directly on the kobject whenever the file is opened or closed.
@@ -55,7 +55,7 @@ Attributes
~~~~~~~~~~
Attributes can be exported for kobjects in the form of regular files in
-the filesystem. Sysfs forwards file I/O operations to methods defined
+the filesystem. sysfs forwards file I/O operations to methods defined
for the attributes, providing a means to read and write kernel
attributes.
@@ -72,8 +72,8 @@ you publicly humiliated and your code rewritten without notice.
An attribute definition is simply::
struct attribute {
- char * name;
- struct module *owner;
+ char *name;
+ struct module *owner;
umode_t mode;
};
@@ -138,7 +138,7 @@ __ATTR_WO(name):
assumes a name_store only and is restricted to mode
0200 that is root write access only.
__ATTR_RO_MODE(name, mode):
- fore more restrictive RO access currently
+ for more restrictive RO access; currently
only use case is the EFI System Resource Table
(see drivers/firmware/efi/esrt.c)
__ATTR_RW(name):
@@ -207,7 +207,7 @@ IOW, they should take only an object, an attribute, and a buffer as parameters.
sysfs allocates a buffer of size (PAGE_SIZE) and passes it to the
-method. Sysfs will call the method exactly once for each read or
+method. sysfs will call the method exactly once for each read or
write. This forces the following behavior on the method
implementations:
@@ -221,7 +221,7 @@ implementations:
be called again, rearmed, to fill the buffer.
- On write(2), sysfs expects the entire buffer to be passed during the
- first write. Sysfs then passes the entire buffer to the store() method.
+ first write. sysfs then passes the entire buffer to the store() method.
A terminating null is added after the data on stores. This makes
functions like sysfs_streq() safe to use.
@@ -237,7 +237,7 @@ Other notes:
- Writing causes the show() method to be rearmed regardless of current
file position.
-- The buffer will always be PAGE_SIZE bytes in length. On i386, this
+- The buffer will always be PAGE_SIZE bytes in length. On x86, this
is 4096.
- show() methods should return the number of bytes printed into the
@@ -253,7 +253,7 @@ Other notes:
through, be sure to return an error.
- The object passed to the methods will be pinned in memory via sysfs
- referencing counting its embedded object. However, the physical
+ reference counting its embedded object. However, the physical
entity (e.g. device) the object represents may not be present. Be
sure to have a way to check this, if necessary.
@@ -263,7 +263,7 @@ A very simple (and naive) implementation of a device attribute is::
static ssize_t show_name(struct device *dev, struct device_attribute *attr,
char *buf)
{
- return scnprintf(buf, PAGE_SIZE, "%s\n", dev->name);
+ return sysfs_emit(buf, "%s\n", dev->name);
}
static ssize_t store_name(struct device *dev, struct device_attribute *attr,
@@ -295,8 +295,12 @@ The top level sysfs directory looks like::
dev/
devices/
firmware/
- net/
fs/
+ hypervisor/
+ kernel/
+ module/
+ net/
+ power/
devices/ contains a filesystem representation of the device tree. It maps
directly to the internal kernel device tree, which is a hierarchy of
@@ -317,15 +321,18 @@ span multiple bus types).
fs/ contains a directory for some filesystems. Currently each
filesystem wanting to export attributes must create its own hierarchy
-below fs/ (see ./fuse.txt for an example).
+below fs/ (see ./fuse.rst for an example).
+
+module/ contains parameter values and state information for all
+loaded system modules, for both builtin and loadable modules.
-dev/ contains two directories char/ and block/. Inside these two
+dev/ contains two directories: char/ and block/. Inside these two
directories there are symlinks named <major>:<minor>. These symlinks
point to the sysfs directory for the given device. /sys/dev provides a
quick way to lookup the sysfs interface for a device from the result of
a stat(2) operation.
-More information can driver-model specific features can be found in
+More information on driver-model specific features can be found in
Documentation/driver-api/driver-model/.
@@ -335,7 +342,7 @@ TODO: Finish this section.
Current Interfaces
~~~~~~~~~~~~~~~~~~
-The following interface layers currently exist in sysfs:
+The following interface layers currently exist in sysfs.
devices (include/linux/device.h)
diff --git a/Documentation/filesystems/ubifs.rst b/Documentation/filesystems/ubifs.rst
index e6ee99762534..ced2f7679ddb 100644
--- a/Documentation/filesystems/ubifs.rst
+++ b/Documentation/filesystems/ubifs.rst
@@ -59,7 +59,7 @@ differences.
* JFFS2 is a write-through file-system, while UBIFS supports write-back,
which makes UBIFS much faster on writes.
-Similarly to JFFS2, UBIFS supports on-the-flight compression which makes
+Similarly to JFFS2, UBIFS supports on-the-fly compression which makes
it possible to fit quite a lot of data to the flash.
Similarly to JFFS2, UBIFS is tolerant of unclean reboots and power-cuts.
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 6cd6953e175b..2c15e7053113 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -274,6 +274,9 @@ or bottom half).
This is specifically for the inode itself being marked dirty,
not its data. If the update needs to be persisted by fdatasync(),
then I_DIRTY_DATASYNC will be set in the flags argument.
+ I_DIRTY_TIME will be set in the flags in case lazytime is enabled
+ and struct inode has times updated since the last ->dirty_inode
+ call.
``write_inode``
this method is called when the VFS needs to write an inode to
@@ -432,15 +435,16 @@ As of kernel 2.6.22, the following members are defined:
const char *(*get_link) (struct dentry *, struct inode *,
struct delayed_call *);
int (*permission) (struct user_namespace *, struct inode *, int);
- struct posix_acl * (*get_acl)(struct inode *, int, bool);
+ struct posix_acl * (*get_inode_acl)(struct inode *, int, bool);
int (*setattr) (struct user_namespace *, struct dentry *, struct iattr *);
int (*getattr) (struct user_namespace *, const struct path *, struct kstat *, u32, unsigned int);
ssize_t (*listxattr) (struct dentry *, char *, size_t);
void (*update_time)(struct inode *, struct timespec *, int);
int (*atomic_open)(struct inode *, struct dentry *, struct file *,
unsigned open_flag, umode_t create_mode);
- int (*tmpfile) (struct user_namespace *, struct inode *, struct dentry *, umode_t);
- int (*set_acl)(struct user_namespace *, struct inode *, struct posix_acl *, int);
+ int (*tmpfile) (struct user_namespace *, struct inode *, struct file *, umode_t);
+ struct posix_acl * (*get_acl)(struct user_namespace *, struct dentry *, int);
+ int (*set_acl)(struct user_namespace *, struct dentry *, struct posix_acl *, int);
int (*fileattr_set)(struct user_namespace *mnt_userns,
struct dentry *dentry, struct fileattr *fa);
int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
@@ -589,7 +593,9 @@ otherwise noted.
``tmpfile``
called in the end of O_TMPFILE open(). Optional, equivalent to
atomically creating, opening and unlinking a file in given
- directory.
+ directory. On success needs to return with the file already
+ open; this can be done by calling finish_open_simple() right at
+ the end.
``fileattr_get``
called on ioctl(FS_IOC_GETFLAGS) and ioctl(FS_IOC_FSGETXATTR) to
diff --git a/Documentation/filesystems/xfs-delayed-logging-design.rst b/Documentation/filesystems/xfs-delayed-logging-design.rst
index 4ef419f54663..6402ab8e370c 100644
--- a/Documentation/filesystems/xfs-delayed-logging-design.rst
+++ b/Documentation/filesystems/xfs-delayed-logging-design.rst
@@ -100,7 +100,7 @@ transactions together::
ntp = xfs_trans_dup(tp);
xfs_trans_commit(tp);
- xfs_log_reserve(ntp);
+ xfs_trans_reserve(ntp);
This results in a series of "rolling transactions" where the inode is locked
across the entire chain of transactions. Hence while this series of rolling
@@ -191,7 +191,7 @@ transaction rolling mechanism to re-reserve space on every transaction roll. We
know from the implementation of the permanent transactions how many transaction
rolls are likely for the common modifications that need to be made.
-For example, and inode allocation is typically two transactions - one to
+For example, an inode allocation is typically two transactions - one to
physically allocate a free inode chunk on disk, and another to allocate an inode
from an inode chunk that has free inodes in it. Hence for an inode allocation
transaction, we might set the reservation log count to a value of 2 to indicate
@@ -200,7 +200,7 @@ chain. Each time a permanent transaction rolls, it consumes an entire unit
reservation.
Hence when the permanent transaction is first allocated, the log space
-reservation is increases from a single unit reservation to multiple unit
+reservation is increased from a single unit reservation to multiple unit
reservations. That multiple is defined by the reservation log count, and this
means we can roll the transaction multiple times before we have to re-reserve
log space when we roll the transaction. This ensures that the common
@@ -259,7 +259,7 @@ the next transaction in the sequeunce, but we have none remaining. We cannot
sleep during the transaction commit process waiting for new log space to become
available, as we may end up on the end of the FIFO queue and the items we have
locked while we sleep could end up pinning the tail of the log before there is
-enough free space in the log to fulfil all of the pending reservations and
+enough free space in the log to fulfill all of the pending reservations and
then wake up transaction commit in progress.
To take a new reservation without sleeping requires us to be able to take a
@@ -551,14 +551,14 @@ Essentially, this shows that an item that is in the AIL can still be modified
and relogged, so any tracking must be separate to the AIL infrastructure. As
such, we cannot reuse the AIL list pointers for tracking committed items, nor
can we store state in any field that is protected by the AIL lock. Hence the
-committed item tracking needs it's own locks, lists and state fields in the log
+committed item tracking needs its own locks, lists and state fields in the log
item.
Similar to the AIL, tracking of committed items is done through a new list
called the Committed Item List (CIL). The list tracks log items that have been
committed and have formatted memory buffers attached to them. It tracks objects
in transaction commit order, so when an object is relogged it is removed from
-it's place in the list and re-inserted at the tail. This is entirely arbitrary
+its place in the list and re-inserted at the tail. This is entirely arbitrary
and done to make it easy for debugging - the last items in the list are the
ones that are most recently modified. Ordering of the CIL is not necessary for
transactional integrity (as discussed in the next section) so the ordering is
@@ -615,7 +615,7 @@ those changes into the current checkpoint context. We then initialise a new
context and attach that to the CIL for aggregation of new transactions.
This allows us to unlock the CIL immediately after transfer of all the
-committed items and effectively allow new transactions to be issued while we
+committed items and effectively allows new transactions to be issued while we
are formatting the checkpoint into the log. It also allows concurrent
checkpoints to be written into the log buffers in the case of log force heavy
workloads, just like the existing transaction commit code does. This, however,
@@ -884,9 +884,9 @@ pin the object the first time it is inserted into the CIL - if it is already in
the CIL during a transaction commit, then we do not pin it again. Because there
can be multiple outstanding checkpoint contexts, we can still see elevated pin
counts, but as each checkpoint completes the pin count will retain the correct
-value according to it's context.
+value according to its context.
-Just to make matters more slightly more complex, this checkpoint level context
+Just to make matters slightly more complex, this checkpoint level context
for the pin count means that the pinning of an item must take place under the
CIL commit/flush lock. If we pin the object outside this lock, we cannot
guarantee which context the pin count is associated with. This is because of