Decentralized AI
aerOS aux AI services cover different functionalities required to execute AI tasks using aerOS infrastructure. AI tasks cover two main scenarios: federated learning and distributed inference. AI tasks can be decomposed into sub-tasks that may have dedicated requirements and their execution can span over several IEs (functionality divided between services deployed over different IEs). In the simplest case AI task may have only one sub-task, i.e., workflow consisting of one step, e.g. deployment of a service offering predictions done by ML model. Note that AI task is a specialization of a general task that can be executed using aerOS infrastructure.
Within aerOS dedicated services are prepared to: monitor and orchestrate specific task execution (AI Task [n] Controller) and execute an AI sub-task (AI Local Executor). They are deployed as auxiliary services on the aerOS infrastructure using aerOS service deployment and orchestration mechanisms. AI Task [n] Controller is responsible for task n-th and is collaborating with AI Task Executor services that are executing parts of this tasks on different IEs. In federated learning, at the end of the process, AI Task [n] Controller will have a new shared model trained in a federated way within AI Local Executor services.
AI Task Controller is deployed on aerOS infrastructure to control the execution of an AI task with respect to accepting configuration, synchronization of partial results, managing the training process. It is composed of the following elements. AI Local Executor is a service deployed on aerOS infrastructure to execute workload for an AI sub-task i.e., to execute a local training round.
FL Controller - responsible for accepting a task description, initializing execution of workload using deployed services, managing and monitoring task lifecycle.
FL Training Collector - The FL training process involves several independent parties that collaborate to provide an enhanced ML model. In this process, the different local updates suggestions should be aggregated. This is tackled by the FL Training Collector, which is also in charge of sending the results of the training along with the updated model weights to FL Repository for storage.
FL Repository - One of the key application aspects of FL is making it persistent and configurable. The FL repository stores (and delivers upon request/need) the ML aggregation algorithms, ML models and the results of ML training.
Model Inferencer - provides predictions of a selected model.
Model Trainer - offers functionalities of an FL client, including the local training and evaluation of models.
Data Transformer - supports data preprocessing, data loading and training methods using serializable modules.