Self-realtimeness

Introduction

The self-realtimeness module continuously monitors the performance of real-time services and invokes the self-orchestrator if a service’s real-time performance drops below a configurable threshold. The real-time performance is captured by the value of a service’s time utility function (TUF). The TUF degrades with the tardiness, i.e., the completion time of a service’s job past its deadline. For example, a TUF that returns 1.0 for a job completing before its deadline and 0.0 for a job completing past its deadline models a hard real-time requirement. In aerOS we do not serve software components with hard real-time requirements thus the TUF either degrades linearly or exponentially with a service’s tardiness, i.e., it returns 1.0 if a job completes before the deadline and a value between 1.0 and 0.0 if it completes past its deadline. The self-realtimeness module consists of a custom Linux kernel with patches for hierarchical constant bandwidth scheduling (HCBS), a kernel module monitoring real-time services’ tardiness, and a containerized node-level controller calculating the TUF of an infrastructure elements’ real-time services and invoking its self-orchestrator if necessary.

Features

  • Monitor time-utility of real-time services (periodically invoked services with soft deadline).

  • Invoke self-orchestrator if time-utility of a service exceeds configurable threshold.

Place in architecture

The following figure describe the self-orchestrator module inside the IE and the relationships with another self-* modules.

self-orchestrator module inside the IE and the relationships with another self-* modules

User guide

The use of the self-realtimeness module requires the installation and configuration of a custom kernel build from source that might vary from platform to platform. Therefore, we always recommend manual configuration and installation on target platforms that use self-realtimeness.

Prerequisities

Presuming an Ubuntu or Debian Linux install dependencies to build the Linux kernel:

sudo apt install build-essential bc

We assume Docker is installed.

Installation

You must build the custom Linux kernel for the target platform and the node-level controller docker image that also compiles the kernel module.

Linux Kernel Build (target: x86-64)

Configure and build the HCBS Linux kernel for an x86-64 target as follows.

  1. Clone the self-realtimeness repository:

git clone https://gitlab.aeros-project.eu/wp3/t3.5/self-realtimeness.git
cd self-realtimeness/
  1. Assuming you are building the kernel on the target platform, configure and build the kernel:

cd hcbs-kernel/
make defconfig
make -j
  1. Install your HCBS kernel and kernel modules:

sudo make modules_install
sudo make install

Linux Kernel Build (target: NVIDIA Jetson Xavier AGX Arm64)

Configure and build the HCBS Linux kernel for the NVIDIA Jetson Xavier AGX as follows.

  1. Clone the self-realtimeness repository and checkout the corresponding branch:

git clone -b nvidia-jetson-agx-xavier https://gitlab.aeros-project.eu/wp3/t3.5/self-realtimeness.git
cd self-realtimeness/
  1. Assuming you are building the kernel on the target platform, configure and build the kernel:

cd hcbs-kernel/kernel/kernel-5.10/
make tegra_defconfig
make -j ARCH=arm64
  1. Install your HCBS kernel and kernel modules:

sudo make modules_install
sudo cp arch/arm64/boot/Image /boot/Image
sudo cp drivers/gpu/nvgpu/nvgpu.ko /usr/lib/modules/5.10.120-hcbs+/kernel/drivers/gpu/nvgpu/nvgpu.ko
  1. Replace the contents of the boot configuration in /boot/extlinux/extlinux.conf with the following to enable the original kernel to be used as a backup:

TIMEOUT 30
DEFAULT primary

MENU TITLE L4T boot options

LABEL primary
      MENU LABEL primary kernel
      LINUX /boot/Image
      FDT /boot/dtb/kernel_tegra194-p2888-0001-p2822-0000.dtb
      INITRD /boot/initrd
      APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 rootfstype=ext4 video=efifb:off nv-auto-config

# When testing a custom kernel, it is recommended that you create a backup of
# the original kernel and add a new entry to this file so that the device can
# fallback to the original kernel. To do this:
#
# 1, Make a backup of the original kernel
#      sudo cp /boot/Image /boot/Image.backup
#
# 2, Copy your custom kernel into /boot/Image
#
# 3, Uncomment below menu setting lines for the original kernel
#
# 4, Reboot

LABEL backup
   MENU LABEL backup kernel
   LINUX /boot/Image.backup
   FDT /boot/dtb/kernel_tegra194-p2888-0001-p2822-0000.dtb
   INITRD /boot/initrd.backup
   APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 rootfstype=ext4 video=efifb:off nv-auto-config

Build Node-level Controller Docker Image

To use the NLC, build the Docker image with the following command:

docker build . -t nlc:latest

Copy the config.json file to /etc/aeros/self-realtimeness/ and run docker compose to start the node-level controller:

sudo mkdir -p /etc/aeros/self-realtimeness
sudo cp config.json /etc/aeros/self-realtimeness/
docker compose up

Configuration options

The config.json offers following parameters:

  • kernel_module_name: str: The path to the kernel module object file (.ko).

  • interval_tu: str: The mode for the frequency of the TU calculations. Can be one of the following: “min”, “medium”, “max”

  • interval_relocate: str: The mode for the frequency of the relocation operations. Can be one of the following: “fixed”, “min”, “max”

  • interval_container_monitor: float: The interval in which the NLC checks for new containers.

  • gain_reduce: int: The gain with which the quota of a container is reduced when its tasks meet all deadlines. The formula can be found in [this paper]().

  • gain_increase: int: The gain with which the quota of a container is increased, when its tasks miss deadlines. The formula can be found in [this paper]().

  • tuf_active: int: 0 to deactivate the use of TUFs. 1 to activate them

  • id_string: The string which must be part of the name of every RT Docker container. The NLC uses this string to identify RT Docker containers.

  • proc_update_tasks: The path to the update_tasks /proc entry.

  • proc_send_tu: The path to the send_tu /proc entry.

  • proc_get_temporal_errors: The path to the get_tmp_err /proc entry.

  • relocate_error_code: The error code which is sent to the self-orchestration tool, when a container should be relocated.

  • self_orchestration_url: The URL to which a request for relocation is sent.

Developer guide

TUF Kernel Module Communication

There are three functions the kernel module offers via the /proc filesystem. The NLC calls them to read data from the kernel module and write data back. The following sections explain the protocol used in these three functions in a pseudo-regex fashion.

  • Update Tasks: This function is called when the NLC detects a new container. The message contains information about the tasks inside the container that is required by the kernel module.

    n_tasks_in_container,container_process_name,min_quotan (task_process_name,period,deadlinen){n_tasks_in_container}

    Variables

    • n_tasks_in_container: The number of processes inside the container.

    • container_process_name: The name of the entrypoint process. Usually a script. This is not the container name known by Docker.

    • min_quota: The quota the container must at least have.

    • task_process_name: The name of a process inside the container that is called by the entrypoint script and shall be adapted by the kernel module.

    • period: The period of the process.

    • deadline: The deadline of the process.

  • Get Temporal Error: This function is used by the NLC to retrieve the average temporal errors for every task. The message that is sent by the kernel module has the following format.

    (container_process_name:container_markedn (task_process_name,avg_tmp_err){n_tasks_in_container}){n_containers}

    Variables

    • n_tasks_in_container: The number of processes inside the container.

    • n_containers: The number of RT Docker containers on the node.

    • container_process_name: The name of the entrypoint process. Usually a script. This is not the container name known by Docker.

    • task_process_name: The name of a process inside the container that is called by the entrypoint script and shall be adapted by the kernel module.

    • container_marked: 0 if everything is fine and 1 if the container should be relocated.

    • avg_tmp_err: The average temporal error of the task since the last call of this function.

  • Send TU: This function is used by the NLC to send calculated utility of the container back to the kernel module. The containers are sorted by ascending utility such that those containers with a low utility have a higher chance for a quota increase.

    (container_process_name,utility){n_containers}

    Variables

    • n_containers: The number of RT Docker containers on the node.

    • container_process_name: The name of the entrypoint process. Usually a script. This is not the container name known by Docker.

    • utility: The utility value of the container ([0, 100]):

Authors

Stefan Walser & Jan Ruh

License

Copyright (C) TTControl (2024)

All rights reserved.

This document contains proprietary information belonging to TTControl. Passing on and copying of this document, and communication of its contents is not permitted without prior written authorization.

VERSION : 1

DATE : 15.12.2023

AUTHOR : Stefan Walser

Notice (dependencies)