##################### Self-realtimeness ##################### .. contents:: :local: :depth: 1 Introduction ============ The self-realtimeness module continuously monitors the performance of real-time services and invokes the self-orchestrator if a service's real-time performance drops below a configurable threshold. The real-time performance is captured by the value of a service's time utility function (TUF). The TUF degrades with the tardiness, i.e., the completion time of a service's job past its deadline. For example, a TUF that returns 1.0 for a job completing before its deadline and 0.0 for a job completing past its deadline models a hard real-time requirement. In aerOS we do not serve software components with hard real-time requirements thus the TUF either degrades linearly or exponentially with a service's tardiness, i.e., it returns 1.0 if a job completes before the deadline and a value between 1.0 and 0.0 if it completes past its deadline. The self-realtimeness module consists of a custom Linux kernel with patches for hierarchical constant bandwidth scheduling (HCBS), a kernel module monitoring real-time services' tardiness, and a containerized node-level controller calculating the TUF of an infrastructure elements' real-time services and invoking its self-orchestrator if necessary. Features ======== * Monitor time-utility of real-time services (periodically invoked services with soft deadline). * Invoke self-orchestrator if time-utility of a service exceeds configurable threshold. Place in architecture ===================== The following figure describe the self-orchestrator module inside the IE and the relationships with another self-* modules. .. image:: ./self_capabilities_relationships.png :alt: self-orchestrator module inside the IE and the relationships with another self-* modules :align: center User guide ========== The use of the self-realtimeness module requires the installation and configuration of a custom kernel build from source that might vary from platform to platform. Therefore, we always recommend manual configuration and installation on target platforms that use self-realtimeness. Prerequisities ============== Presuming an Ubuntu or Debian Linux install dependencies to build the Linux kernel: .. code-block:: bash sudo apt install build-essential bc We assume Docker is installed. Installation ============ You must build the custom Linux kernel for the target platform and the node-level controller docker image that also compiles the kernel module. Linux Kernel Build (target: x86-64) ----------------------------------- Configure and build the HCBS Linux kernel for an x86-64 target as follows. 1. Clone the self-realtimeness repository: .. code-block:: bash git clone https://gitlab.aeros-project.eu/wp3/t3.5/self-realtimeness.git cd self-realtimeness/ 2. Assuming you are building the kernel on the target platform, configure and build the kernel: .. code-block:: bash cd hcbs-kernel/ make defconfig make -j 3. Install your HCBS kernel and kernel modules: .. code-block:: bash sudo make modules_install sudo make install Linux Kernel Build (target: NVIDIA Jetson Xavier AGX Arm64) ----------------------------------------------------------- Configure and build the HCBS Linux kernel for the NVIDIA Jetson Xavier AGX as follows. 1. Clone the self-realtimeness repository and checkout the corresponding branch: .. code-block:: bash git clone -b nvidia-jetson-agx-xavier https://gitlab.aeros-project.eu/wp3/t3.5/self-realtimeness.git cd self-realtimeness/ 2. Assuming you are building the kernel on the target platform, configure and build the kernel: .. code-block:: bash cd hcbs-kernel/kernel/kernel-5.10/ make tegra_defconfig make -j ARCH=arm64 3. Install your HCBS kernel and kernel modules: .. code-block:: bash sudo make modules_install sudo cp arch/arm64/boot/Image /boot/Image sudo cp drivers/gpu/nvgpu/nvgpu.ko /usr/lib/modules/5.10.120-hcbs+/kernel/drivers/gpu/nvgpu/nvgpu.ko 4. Replace the contents of the boot configuration in /boot/extlinux/extlinux.conf with the following to enable the original kernel to be used as a backup: .. code-block:: bash TIMEOUT 30 DEFAULT primary MENU TITLE L4T boot options LABEL primary MENU LABEL primary kernel LINUX /boot/Image FDT /boot/dtb/kernel_tegra194-p2888-0001-p2822-0000.dtb INITRD /boot/initrd APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 rootfstype=ext4 video=efifb:off nv-auto-config # When testing a custom kernel, it is recommended that you create a backup of # the original kernel and add a new entry to this file so that the device can # fallback to the original kernel. To do this: # # 1, Make a backup of the original kernel # sudo cp /boot/Image /boot/Image.backup # # 2, Copy your custom kernel into /boot/Image # # 3, Uncomment below menu setting lines for the original kernel # # 4, Reboot LABEL backup MENU LABEL backup kernel LINUX /boot/Image.backup FDT /boot/dtb/kernel_tegra194-p2888-0001-p2822-0000.dtb INITRD /boot/initrd.backup APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 rootfstype=ext4 video=efifb:off nv-auto-config Build Node-level Controller Docker Image ---------------------------------------- To use the NLC, build the Docker image with the following command: .. code-block:: bash docker build . -t nlc:latest Copy the `config.json` file to `/etc/aeros/self-realtimeness/` and run docker compose to start the node-level controller: .. code-block:: bash sudo mkdir -p /etc/aeros/self-realtimeness sudo cp config.json /etc/aeros/self-realtimeness/ docker compose up Configuration options ===================== The config.json offers following parameters: * `kernel_module_name: str`: The path to the kernel module object file (`.ko`). * `interval_tu: str`: The mode for the frequency of the TU calculations. Can be one of the following: `"min", "medium", "max"` * `interval_relocate: str`: The mode for the frequency of the relocation operations. Can be one of the following: `"fixed", "min", "max"` * `interval_container_monitor: float`: The interval in which the NLC checks for new containers. * `gain_reduce: int`: The gain with which the quota of a container is reduced when its tasks meet all deadlines. The formula can be found in [this paper](). * `gain_increase: int`: The gain with which the quota of a container is increased, when its tasks miss deadlines. The formula can be found in [this paper](). * `tuf_active: int`: `0` to deactivate the use of TUFs. `1` to activate them * `id_string`: The string which must be part of the name of every RT Docker container. The NLC uses this string to identify RT Docker containers. * `proc_update_tasks`: The path to the `update_tasks` `/proc` entry. * `proc_send_tu`: The path to the `send_tu` `/proc` entry. * `proc_get_temporal_errors`: The path to the `get_tmp_err` `/proc` entry. * `relocate_error_code`: The error code which is sent to the self-orchestration tool, when a container should be relocated. * `self_orchestration_url`: The URL to which a request for relocation is sent. Developer guide =============== TUF Kernel Module Communication ------------------------------- There are three functions the kernel module offers via the `/proc` filesystem. The NLC calls them to read data from the kernel module and write data back. The following sections explain the protocol used in these three functions in a pseudo-regex fashion. - Update Tasks: This function is called when the NLC detects a new container. The message contains information about the tasks inside the container that is required by the kernel module. n_tasks_in_container,container_process_name,min_quota\n (task_process_name,period,deadline\n){n_tasks_in_container} **Variables** * `n_tasks_in_container`: The number of processes inside the container. * `container_process_name`: The name of the entrypoint process. Usually a script. This is **not** the container name known by Docker. * `min_quota`: The quota the container must at least have. * `task_process_name`: The name of a process inside the container that is called by the entrypoint script and shall be adapted by the kernel module. * `period`: The period of the process. * `deadline`: The deadline of the process. - Get Temporal Error: This function is used by the NLC to retrieve the average temporal errors for every task. The message that is sent by the kernel module has the following format. (container_process_name:container_marked\n (task_process_name,avg_tmp_err){n_tasks_in_container}){n_containers} **Variables** * `n_tasks_in_container`: The number of processes inside the container. * `n_containers`: The number of RT Docker containers on the node. * `container_process_name`: The name of the entrypoint process. Usually a script. This is **not** the container name known by Docker. * `task_process_name`: The name of a process inside the container that is called by the entrypoint script and shall be adapted by the kernel module. * `container_marked`: `0` if everything is fine and `1` if the container should be relocated. * `avg_tmp_err`: The average temporal error of the task since the last call of this function. - Send TU: This function is used by the NLC to send calculated utility of the container back to the kernel module. The containers are sorted by ascending utility such that those containers with a low utility have a higher chance for a quota increase. (container_process_name,utility){n_containers} **Variables** * `n_containers`: The number of RT Docker containers on the node. * `container_process_name`: The name of the entrypoint process. Usually a script. This is **not** the container name known by Docker. * `utility`: The utility value of the container ([0, 100]): Authors ======= Stefan Walser & Jan Ruh License ======= Copyright (C) TTControl (2024) All rights reserved. This document contains proprietary information belonging to TTControl. Passing on and copying of this document, and communication of its contents is not permitted without prior written authorization. VERSION : 1 DATE : 15.12.2023 AUTHOR : Stefan Walser Notice (dependencies) =====================