##################### FL Repository ##################### .. contents:: :local: :depth: 1 Introduction ============ FL Repository allows for the storage of reusable components for the future AI training processes, such as ML models, FL aggregation strategies and data transformations, as well as enables the FL Training Collector to safely store the results of the FL training (metadata plus pickled model weights). Place in architecture ===================== FL Repository is a component of AI Task Controller and one of auxiliary AI services that can be deployed using general aerOS mechanisms. User guide ========== To enable easier testing consider initializing the database with dump file should be considered (see Serializing and deserializing database). Interaction with FL Repository is done using REST API that is described in section Configuration. Installation ============ FL Repository should be deployed as part of AI Task Controller to support AI tasks execution along with AI Local Execution. The service exposes REST API to allow communication with other AI services and with the external parties. FL Repository can be run using docker-compose or can be deployed on Kubernetes cluster with a dedicated Helm chart. Running using Docker (locally) ------------------------------ The command can be used in the terminal to build a new Docker image: .. code-block:: bash docker-compose -f docker-compose.yml up --force-recreate --build -d When the container is built and run, it should be checked with command: docker ps. For one instance of AI Local Exceution the output should look like this: +--------------+---------------------------------+-----+--------------+--------------------------------------+----------------+ | CONTAINER ID | IMAGE | ... | STATUS | PORTS | NAMES | +==============+=================================+=====+==============+======================================+================+ | 8c4744c648c0 | aeros/fl\_repository:latest | ... | Up 5 minutes | 0.0.0.0:9013->9012/tcp | flrepository | +--------------+---------------------------------+-----+--------------+--------------------------------------+----------------+ | 24964eafadc9 | aeros/fl\_repository\_db:latest | ... | Up 5 minutes | 0.0.0.0:27017-27019->27017-27019/tcp | flrepositorydb | +--------------+---------------------------------+-----+--------------+--------------------------------------+----------------+ The Swagger documentation of the REST API should be visible under url: ``http://127.0.0.1:9013/docs`` (if default port configuration is preserved) **Note**: When running using Docker make sure that all containers (FL Local Execution, FL Repository, FL Training Collector) that are to be used to run the FL task are in the same network. The following commands can be used: ``docker inspect -f '{{range $key, $value := .NetworkSettings.Networks}}{{$key}} {{end}}' [CONTAINER_ID]`` - check network of a given container ``docker network inspect -f '{{range .Containers}}{{.Name}} {{end}}' [NETWORK e.g. appv0_default]`` - check all containters in a given network ``docker network connect [NETWORK] [CONTAINER_ID]`` - add a given container to a given network Deployment on Kubernetes ------------------------ The FL Repository has been developed with the assumption that it will be deployed on Kubernetes with a dedicated Helm chart. To do it, run ``helm install flrepositorydb``. To make sure that before that the enabler has been configured properly, check if the values in the ``repository-configmap`` have been properly set. By default, the chart also uses the host's ports ``30001`` as a Node Port. The Swagger documentation will be avilable at ``http://127.0.0.1:XXXXX/docs`` URL, where XXXXX stands for the flrepository service NodePort. Serializing and deserializing database --------------------------------------- If you want for the MongoDB database on your custom repositorydb image to initialize with some of the preexisting objects already stored in the collections, you can achieve this by extracting database dump from a pod in Kubernetes: 1. Use the API to add and subtract the objects in the database until it has the desired content. 2. Use the ``kubectl exec -i -t -- /bin/bash`` command to reach the commandline of the repositorydb pod. 3. Use the ``mongodump --archive db.dump`` tool with the appropriate options to create the backup file. 4. Use ``kubectl cp :/db.dump .db.dump`` to move the archive file from the pod to the repository. 5. Move the db.dump file to the mongo_db directory. 6. Run the ``docker-compose -f docker-compose.yml up --force-recreate --build -d`` command to construct the right image. Alternatively, the database dump can be extracted from a containter in Docker: 1. Use the API to add and subtract the objects in the database until it has the desired content. 2. Use the ``docker exec -i -t -- /bin/bash`` command to reach the commandline of the repositorydb container. 3. Use the ``mongodump --archive db.dump`` tool with the appropriate options to create the backup file. 4. Use ``docker cp :/db.dump .db.dump`` to move the archive file from the pod to the repository. 5. Move the db.dump file to the mongo_db directory. 6. Run the ``docker-compose -f docker-compose.yml up --force-recreate --build -d`` command to construct the right image. **Note**: Before running the tests make sure that any required model metadata and transformation are in the database for the repository. Configuration options ===================== Configuration can be done through REST API. Below is description of collections stored in FL Repository and operations that can be used to manage them. Collection: ML Models --------------------- The models as a collection contain the information about models ready and available for FL training. This means the relevant metadata (the model name and version, which are used to effectively distinguish between models; the library, which indicates whether the model is ready for training using the Pytorch or Keras local client; and the description, which contains a couple of words describing the architecture of a potential model). For example: .. code-block:: JSON { "meta": { "library": "keras", "description": "Test model - A CNN (Convolutional Neural Network) designed to solve the CIFAR-10 image classification task." }, "model_name": "base", "model_version": "base2", "model_id": "94586930dn3k56mqa90c2342" } The models collection can be manipulating using the following endpoints: - **POST /model** Adds the metadata of a new initial model to the library. - **PUT /model/{name}/{version}** Depending on whether a model with a given name and version exists in the FL Repository, its object file is created or updated. - **PUT /model/meta/{name}/{version}** For the given model name and version its metadata is updated. - **GET /model** Return the list encompassing the metadata of all available models. - **GET /model/meta** Return the metadata of the model with a given name and version. - **GET /model/{name}/{version}** Return the binary file containing the final model weights and structure. - **DELETE /model/{name}/{version}** Delete the metadata and binary file of a model with a given name and version. - **GET /models/available** Return the list encompassing the metadata of all available models, sorted by their upload date. - **GET /models/download/shell/{filename}** Download the binary files containing the model weights and structure for all available models, sorted by their upload date. Collection: ML Training Results ------------------------------- The training results as a collection contain the information about the FL training processes that have been conducted, along with the results of that training in the form of pickled final model weights. For example: .. code-block:: JSON { "model_name": "md_keras", "model_version": "v1", "training_id": "1", "results": { "rounds": "3", "final_loss": "81.94759609735472", "accuracy": "0.24294586930293454", "min_fit_clients": "1", "min_evaluate_clients": "1", "min_available_clients": "1" }, "weights_id": "943i4n4k505j43b4m5qa2345", "configuration_id": "1" } The training results collection can be manipulating using the following endpoints: - **POST /training-results** Upload new training results (including final model weights and metadata containing aggregated metrics and configuration) for a given model_name, model_version, training_id and configuration_id. - **GET /training-results** Get the list with the metadata of all training results available in this FL Repository instance. - **GET /training-results/{name}/{version}** Get the list with the metadata of all training results available in this FL Repository instance for a given model name and version. - **PUT /training-results/{name}/{version}/{training-id}/{configuration-id}** Update the final training weights of a given training results instance. - **GET /training-results/weights/{name}/{version}/{training-id}** Return the final weights achieved as a result of a training. - **DELETE /training-results/{name}/{version}/{training-id}** Delete the selected training results (the weights along with the metadata). Collection: FL Strategies ------------------------- The strategies as a collection contain the information about the available FL aggregation strategies along with their serialized objects. For example: .. code-block:: JSON { "meta": { }, "strategy_name": "fault-tolerant-fedavg", "strategy_description": "A fault tolerant version of FedAvg", "strategy_id": "45753r89gg093jr523455234" } The aggregation strategy collection can be manipulating using the following endpoints: - **POST /strategy** Create a new aggregation strategy with the specified metadata. - **PUT /strategy/{name}** Update the object file for the aggregation strategy with a given name. - **PUT /strategy/meta/{name}** Update the metadata for the aggregation strategy with a given name. - **GET /strategy** Get the metadata of all available aggregation strategies in the form of a list. - **GET /strategy/{name}** Get the object file of the aggregation strategy with a given name. - **DELETE /strategy/{name}** Delete the aggregation strategy of a given name. Collection: ML Data Transformations ----------------------------------- The data transformations as a collection contain the information about the available FL data transformations, which can be along with their serialized objects. For example: .. code-block:: JSON { "id": "application.tests.categorical_transformation", "description": "The class transforming y data into categorical data", "parameter_types": { "categories": "int" }, "default_values": { "categories": 10 }, "outputs": [ "np.ndarray", "np.ndarray" ], "needs": { "storage": 0, "RAM": 0, "GPU": false, "preinstalled_libraries": {}, "available_models": {} } } The data transformations collection can be manipulating using the following endpoints: - **POST /transformation** Create a new data transformation with the specified metadata. - **PUT /transformation/{id}** Update the object file for a given data transformation. - **PUT /transformation/meta/{id}** Update the metadata of a given data transformation. - **GET /transformation** Get the list with the metadata of all data transformations available in this FL Repository instance. - **GET /transformation/{id}** Get the object file of a data transformation with a given id. - **DELETE /transformation/{id}** Delete the metadata and the object file of a data transformation with a given id. Developer guide =============== TBD Authors ======= The FL Repository service is a continuation of research conducted within Horizon 2020 ASSIST-IoT project. `Systems Research Institute, Polish Academy of Sciences, Warsaw `__ License ======= The FL Repository is released under the Apache 2.0 license (available at [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)), as we have internally concluded that we are not “offering the functionality of MongoDB, or modified versions of MongoDB, to third parties as a service”. However, potential future commercial adopters should be aware that our project uses MongoDB in order to be able to accurately determine the license most applicable to their projects. Notice (dependencies) =====================