Self-realtimeness
Introduction
The self-realtimeness module continuously monitors the performance of real-time services and invokes the self-orchestrator if a service’s real-time performance drops below a configurable threshold. The real-time performance is captured by the value of a service’s time utility function (TUF). The TUF degrades with the tardiness, i.e., the completion time of a service’s job past its deadline. For example, a TUF that returns 1.0 for a job completing before its deadline and 0.0 for a job completing past its deadline models a hard real-time requirement. In aerOS we do not serve software components with hard real-time requirements thus the TUF either degrades linearly or exponentially with a service’s tardiness, i.e., it returns 1.0 if a job completes before the deadline and a value between 1.0 and 0.0 if it completes past its deadline. The self-realtimeness module consists of a custom Linux kernel with patches for hierarchical constant bandwidth scheduling (HCBS), a kernel module monitoring real-time services’ tardiness, and a containerized node-level controller calculating the TUF of an infrastructure elements’ real-time services and invoking its self-orchestrator if necessary.
Features
Monitor time-utility of real-time services (periodically invoked services with soft deadline).
Invoke self-orchestrator if time-utility of a service exceeds configurable threshold.
Place in architecture
The following figure describe the self-orchestrator module inside the IE and the relationships with another self-* modules.
User guide
The use of the self-realtimeness module requires the installation and configuration of a custom kernel build from source that might vary from platform to platform. Therefore, we always recommend manual configuration and installation on target platforms that use self-realtimeness.
Prerequisities
Presuming an Ubuntu or Debian Linux install dependencies to build the Linux kernel:
sudo apt install build-essential bc
We assume Docker is installed.
Installation
You must build the custom Linux kernel for the target platform and the node-level controller docker image that also compiles the kernel module.
Linux Kernel Build (target: x86-64)
Configure and build the HCBS Linux kernel for an x86-64 target as follows.
Clone the self-realtimeness repository:
git clone https://gitlab.aeros-project.eu/wp3/t3.5/self-realtimeness.git cd self-realtimeness/
Assuming you are building the kernel on the target platform, configure and build the kernel:
cd hcbs-kernel/ make defconfig make -j
Install your HCBS kernel and kernel modules:
sudo make modules_install sudo make install
Linux Kernel Build (target: NVIDIA Jetson Xavier AGX Arm64)
Configure and build the HCBS Linux kernel for the NVIDIA Jetson Xavier AGX as follows.
Clone the self-realtimeness repository and checkout the corresponding branch:
git clone -b nvidia-jetson-agx-xavier https://gitlab.aeros-project.eu/wp3/t3.5/self-realtimeness.git cd self-realtimeness/
Assuming you are building the kernel on the target platform, configure and build the kernel:
cd hcbs-kernel/kernel/kernel-5.10/ make tegra_defconfig make -j ARCH=arm64
Install your HCBS kernel and kernel modules:
sudo make modules_install sudo cp arch/arm64/boot/Image /boot/Image sudo cp drivers/gpu/nvgpu/nvgpu.ko /usr/lib/modules/5.10.120-hcbs+/kernel/drivers/gpu/nvgpu/nvgpu.ko
Replace the contents of the boot configuration in /boot/extlinux/extlinux.conf with the following to enable the original kernel to be used as a backup:
TIMEOUT 30 DEFAULT primary MENU TITLE L4T boot options LABEL primary MENU LABEL primary kernel LINUX /boot/Image FDT /boot/dtb/kernel_tegra194-p2888-0001-p2822-0000.dtb INITRD /boot/initrd APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 rootfstype=ext4 video=efifb:off nv-auto-config # When testing a custom kernel, it is recommended that you create a backup of # the original kernel and add a new entry to this file so that the device can # fallback to the original kernel. To do this: # # 1, Make a backup of the original kernel # sudo cp /boot/Image /boot/Image.backup # # 2, Copy your custom kernel into /boot/Image # # 3, Uncomment below menu setting lines for the original kernel # # 4, Reboot LABEL backup MENU LABEL backup kernel LINUX /boot/Image.backup FDT /boot/dtb/kernel_tegra194-p2888-0001-p2822-0000.dtb INITRD /boot/initrd.backup APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 rootfstype=ext4 video=efifb:off nv-auto-config
Build Node-level Controller Docker Image
To use the NLC, build the Docker image with the following command:
docker build . -t nlc:latest
Copy the config.json file to /etc/aeros/self-realtimeness/ and run docker compose to start the node-level controller:
sudo mkdir -p /etc/aeros/self-realtimeness sudo cp config.json /etc/aeros/self-realtimeness/ docker compose up
Configuration options
The config.json offers following parameters:
kernel_module_name: str: The path to the kernel module object file (.ko).
interval_tu: str: The mode for the frequency of the TU calculations. Can be one of the following: “min”, “medium”, “max”
interval_relocate: str: The mode for the frequency of the relocation operations. Can be one of the following: “fixed”, “min”, “max”
interval_container_monitor: float: The interval in which the NLC checks for new containers.
gain_reduce: int: The gain with which the quota of a container is reduced when its tasks meet all deadlines. The formula can be found in [this paper]().
gain_increase: int: The gain with which the quota of a container is increased, when its tasks miss deadlines. The formula can be found in [this paper]().
tuf_active: int: 0 to deactivate the use of TUFs. 1 to activate them
id_string: The string which must be part of the name of every RT Docker container. The NLC uses this string to identify RT Docker containers.
proc_update_tasks: The path to the update_tasks /proc entry.
proc_send_tu: The path to the send_tu /proc entry.
proc_get_temporal_errors: The path to the get_tmp_err /proc entry.
relocate_error_code: The error code which is sent to the self-orchestration tool, when a container should be relocated.
self_orchestration_url: The URL to which a request for relocation is sent.
Developer guide
TUF Kernel Module Communication
There are three functions the kernel module offers via the /proc filesystem. The NLC calls them to read data from the kernel module and write data back. The following sections explain the protocol used in these three functions in a pseudo-regex fashion.
Update Tasks: This function is called when the NLC detects a new container. The message contains information about the tasks inside the container that is required by the kernel module.
n_tasks_in_container,container_process_name,min_quotan (task_process_name,period,deadlinen){n_tasks_in_container}
Variables
n_tasks_in_container: The number of processes inside the container.
container_process_name: The name of the entrypoint process. Usually a script. This is not the container name known by Docker.
min_quota: The quota the container must at least have.
task_process_name: The name of a process inside the container that is called by the entrypoint script and shall be adapted by the kernel module.
period: The period of the process.
deadline: The deadline of the process.
Get Temporal Error: This function is used by the NLC to retrieve the average temporal errors for every task. The message that is sent by the kernel module has the following format.
(container_process_name:container_markedn (task_process_name,avg_tmp_err){n_tasks_in_container}){n_containers}
Variables
n_tasks_in_container: The number of processes inside the container.
n_containers: The number of RT Docker containers on the node.
container_process_name: The name of the entrypoint process. Usually a script. This is not the container name known by Docker.
task_process_name: The name of a process inside the container that is called by the entrypoint script and shall be adapted by the kernel module.
container_marked: 0 if everything is fine and 1 if the container should be relocated.
avg_tmp_err: The average temporal error of the task since the last call of this function.
Send TU: This function is used by the NLC to send calculated utility of the container back to the kernel module. The containers are sorted by ascending utility such that those containers with a low utility have a higher chance for a quota increase.
(container_process_name,utility){n_containers}
Variables
n_containers: The number of RT Docker containers on the node.
container_process_name: The name of the entrypoint process. Usually a script. This is not the container name known by Docker.
utility: The utility value of the container ([0, 100]):
License
Copyright (C) TTControl (2024)
All rights reserved.
This document contains proprietary information belonging to TTControl. Passing on and copying of this document, and communication of its contents is not permitted without prior written authorization.
VERSION : 1
DATE : 15.12.2023
AUTHOR : Stefan Walser