diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/cgroup-v1/rdma.txt | 109 | ||||
-rw-r--r-- | Documentation/cgroup-v2.txt | 103 |
2 files changed, 205 insertions, 7 deletions
diff --git a/Documentation/cgroup-v1/rdma.txt b/Documentation/cgroup-v1/rdma.txt new file mode 100644 index 000000000000..af618171e0eb --- /dev/null +++ b/Documentation/cgroup-v1/rdma.txt @@ -0,0 +1,109 @@ + RDMA Controller + ---------------- + +Contents +-------- + +1. Overview + 1-1. What is RDMA controller? + 1-2. Why RDMA controller needed? + 1-3. How is RDMA controller implemented? +2. Usage Examples + +1. Overview + +1-1. What is RDMA controller? +----------------------------- + +RDMA controller allows user to limit RDMA/IB specific resources that a given +set of processes can use. These processes are grouped using RDMA controller. + +RDMA controller defines two resources which can be limited for processes of a +cgroup. + +1-2. Why RDMA controller needed? +-------------------------------- + +Currently user space applications can easily take away all the rdma verb +specific resources such as AH, CQ, QP, MR etc. Due to which other applications +in other cgroup or kernel space ULPs may not even get chance to allocate any +rdma resources. This can leads to service unavailability. + +Therefore RDMA controller is needed through which resource consumption +of processes can be limited. Through this controller different rdma +resources can be accounted. + +1-3. How is RDMA controller implemented? +---------------------------------------- + +RDMA cgroup allows limit configuration of resources. Rdma cgroup maintains +resource accounting per cgroup, per device using resource pool structure. +Each such resource pool is limited up to 64 resources in given resource pool +by rdma cgroup, which can be extended later if required. + +This resource pool object is linked to the cgroup css. Typically there +are 0 to 4 resource pool instances per cgroup, per device in most use cases. +But nothing limits to have it more. At present hundreds of RDMA devices per +single cgroup may not be handled optimally, however there is no +known use case or requirement for such configuration either. + +Since RDMA resources can be allocated from any process and can be freed by any +of the child processes which shares the address space, rdma resources are +always owned by the creator cgroup css. This allows process migration from one +to other cgroup without major complexity of transferring resource ownership; +because such ownership is not really present due to shared nature of +rdma resources. Linking resources around css also ensures that cgroups can be +deleted after processes migrated. This allow progress migration as well with +active resources, even though that is not a primary use case. + +Whenever RDMA resource charging occurs, owner rdma cgroup is returned to +the caller. Same rdma cgroup should be passed while uncharging the resource. +This also allows process migrated with active RDMA resource to charge +to new owner cgroup for new resource. It also allows to uncharge resource of +a process from previously charged cgroup which is migrated to new cgroup, +even though that is not a primary use case. + +Resource pool object is created in following situations. +(a) User sets the limit and no previous resource pool exist for the device +of interest for the cgroup. +(b) No resource limits were configured, but IB/RDMA stack tries to +charge the resource. So that it correctly uncharge them when applications are +running without limits and later on when limits are enforced during uncharging, +otherwise usage count will drop to negative. + +Resource pool is destroyed if all the resource limits are set to max and +it is the last resource getting deallocated. + +User should set all the limit to max value if it intents to remove/unconfigure +the resource pool for a particular device. + +IB stack honors limits enforced by the rdma controller. When application +query about maximum resource limits of IB device, it returns minimum of +what is configured by user for a given cgroup and what is supported by +IB device. + +Following resources can be accounted by rdma controller. + hca_handle Maximum number of HCA Handles + hca_object Maximum number of HCA Objects + +2. Usage Examples +----------------- + +(a) Configure resource limit: +echo mlx4_0 hca_handle=2 hca_object=2000 > /sys/fs/cgroup/rdma/1/rdma.max +echo ocrdma1 hca_handle=3 > /sys/fs/cgroup/rdma/2/rdma.max + +(b) Query resource limit: +cat /sys/fs/cgroup/rdma/2/rdma.max +#Output: +mlx4_0 hca_handle=2 hca_object=2000 +ocrdma1 hca_handle=3 hca_object=max + +(c) Query current usage: +cat /sys/fs/cgroup/rdma/2/rdma.current +#Output: +mlx4_0 hca_handle=1 hca_object=20 +ocrdma1 hca_handle=1 hca_object=23 + +(d) Delete resource limit: +echo echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index 4cc07ce3b8dd..3b8449f8ac7e 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -47,6 +47,12 @@ CONTENTS 5-3. IO 5-3-1. IO Interface Files 5-3-2. Writeback + 5-4. PID + 5-4-1. PID Interface Files + 5-5. RDMA + 5-5-1. RDMA Interface Files + 5-6. Misc + 5-6-1. perf_event 6. Namespace 6-1. Basics 6-2. The Root and Views @@ -328,14 +334,12 @@ a process with a non-root euid to migrate a target process into a cgroup by writing its PID to the "cgroup.procs" file, the following conditions must be met. -- The writer's euid must match either uid or suid of the target process. - - The writer must have write access to the "cgroup.procs" file. - The writer must have write access to the "cgroup.procs" file of the common ancestor of the source and destination cgroups. -The above three constraints ensure that while a delegatee may migrate +The above two constraints ensure that while a delegatee may migrate processes around freely in the delegated sub-hierarchy it can't pull in from or push out to outside the sub-hierarchy. @@ -350,10 +354,10 @@ all processes under C0 and C1 belong to U0. Let's also say U0 wants to write the PID of a process which is currently in C10 into "C00/cgroup.procs". U0 has write access to the -file and uid match on the process; however, the common ancestor of the -source cgroup C10 and the destination cgroup C00 is above the points -of delegation and U0 would not have write access to its "cgroup.procs" -files and thus the write will be denied with -EACCES. +file; however, the common ancestor of the source cgroup C10 and the +destination cgroup C00 is above the points of delegation and U0 would +not have write access to its "cgroup.procs" files and thus the write +will be denied with -EACCES. 2-6. Guidelines @@ -1119,6 +1123,91 @@ writeback as follows. vm.dirty[_background]_ratio. +5-4. PID + +The process number controller is used to allow a cgroup to stop any +new tasks from being fork()'d or clone()'d after a specified limit is +reached. + +The number of tasks in a cgroup can be exhausted in ways which other +controllers cannot prevent, thus warranting its own controller. For +example, a fork bomb is likely to exhaust the number of tasks before +hitting memory restrictions. + +Note that PIDs used in this controller refer to TIDs, process IDs as +used by the kernel. + + +5-4-1. PID Interface Files + + pids.max + + A read-write single value file which exists on non-root cgroups. The + default is "max". + + Hard limit of number of processes. + + pids.current + + A read-only single value file which exists on all cgroups. + + The number of processes currently in the cgroup and its descendants. + +Organisational operations are not blocked by cgroup policies, so it is +possible to have pids.current > pids.max. This can be done by either +setting the limit to be smaller than pids.current, or attaching enough +processes to the cgroup such that pids.current is larger than +pids.max. However, it is not possible to violate a cgroup PID policy +through fork() or clone(). These will return -EAGAIN if the creation +of a new process would cause a cgroup policy to be violated. + + +5-5. RDMA + +The "rdma" controller regulates the distribution and accounting of +of RDMA resources. + +5-5-1. RDMA Interface Files + + rdma.max + A readwrite nested-keyed file that exists for all the cgroups + except root that describes current configured resource limit + for a RDMA/IB device. + + Lines are keyed by device name and are not ordered. + Each line contains space separated resource name and its configured + limit that can be distributed. + + The following nested keys are defined. + + hca_handle Maximum number of HCA Handles + hca_object Maximum number of HCA Objects + + An example for mlx4 and ocrdma device follows. + + mlx4_0 hca_handle=2 hca_object=2000 + ocrdma1 hca_handle=3 hca_object=max + + rdma.current + A read-only file that describes current resource usage. + It exists for all the cgroup except root. + + An example for mlx4 and ocrdma device follows. + + mlx4_0 hca_handle=1 hca_object=20 + ocrdma1 hca_handle=1 hca_object=23 + + +5-6. Misc + +5-6-1. perf_event + +perf_event controller, if not mounted on a legacy hierarchy, is +automatically enabled on the v2 hierarchy so that perf events can +always be filtered by cgroup v2 path. The controller can still be +moved to a legacy hierarchy after v2 hierarchy is populated. + + 6. Namespace 6-1. Basics |