2.2.2. FL Training Collector enabler

2.2.2.1. Introduction

The FL training process involves several independent parties that commonly collaborate in order to provide an enhanced ML model. In this process, the different local updates suggestions shall be aggregated accordingly. This duty within ASSIST-IoT will be tackled by the FL Training Collector, which will also be in charge of delivering back the updated model.

2.2.2.2. Features

  • Aggregate local updates of the ML model prepared by independent parties as part of a model enhancement process. Responsible components: FLTC Combiner, FLTC I/O.

  • Delivering back to the parties the updated model. Responsible component: FLTC I/O.

2.2.2.3. Place in architecture

FL Training Collector enabler is one of the Federated Learning enablers that together enable to deploy a federated learning environment. Functionally, it operates on scalability and manageability verticals in the Assist-IoT architecture.

More specifically the following figure provides the semantic diagam of the enabler:

Semantic Diagram of Fl Training Collector Enabler

2.2.2.4. User guide

Interactions with this enabler are done through a REST API. In the FL environment the FL Orchestrator enabler sends appropriate configuration to FL Training Collector.

The enabler exposes REST API (see endpoints below) to communicate with external enablers/applications but also uses gRPC to communicate with FL Local Operations during model trainings.

Method

Endpoint

Description

POST

/job/config/{id}

Receive configuration of FL Training Collector components for job with identifier id

GET

/job/status/{id}

Retrieve status of the training process with identifier id

2.2.2.5. Prerequisites

The main prerequisities are the installation of Docker and docker-compose. These prerequisites are necessary in case of running the enabler as a container (Docker). However, it is also possible to run the component independently. In this case, it’s mandatory to have Python installed on the machine where the enabler will be executed. At least version 3.8 is recommended (this is the version of the Python image being used). It is also necessary to install some additional libraries or packages. These additional packages can be seen in the requirements.txt file (inside the application folder).

2.2.2.6. Installation

The installation procedure for this enabler is under development and will be provided once the release of the enabler is completed.

2.2.2.7. Configuration options

The configuration of the training process for the FL Training Collector enabler is done with a request to REST API where the following parameters for a training job to be executed can be set:

  • strategy - name of the strategy to be used in this training job, e.g. “avg”

  • model_id - model identifier

  • num_rounds - number of rounds, e.g. “3”

  • min_fit_clients - minimum number of fitting clients, e.g. “1”

  • min_available_clients - minimum number of available clients, e.g. “1”

  • adapt_config e.g. “custom”

  • config_id - identifier of the configuration to be used

  • batch_size - size of a batch, e.g. “64”

  • steps_per_epoch - number of steps in each epoch, e.g. “32”

  • epochs - number of epochs, e.g. “5”

  • learning_rate - tuning parameter in an optimization algorithm, e.g. “0.001”

2.2.2.8. Developer guide

2.2.2.8.1. Components

2.2.2.8.1.1. FLTC I/O

Provides a REST API to allow the input and output communication to and from the FL Training Collector enabler. On the one hand it is responsible of receiving FL local updates that are sent to the FLC Combiner component. On the other hand, it is responsible of communicating updates of the new FL model obtained in the FLC Component to the FL Repository. The communication capabilities of this component are designed so that it can conceptually deal with situations in which more complex topologies are used.

2.2.2.8.1.2. FLTC Combiner

This component will receive “suggestions” from a certain number (possibly all) local nodes and combine them to generate an updated FL model. It can include both homogeneous and heterogeneous FedAvg solutions for e.g., logistic regression models, decision-tree models, or even neural network models.

2.2.2.8.2. Technologies

2.2.2.8.2.1. FedML

Research library and benchmark for Federated ML containing federated algorithms and optimizers.

2.2.2.8.2.2. Python

Python is an interpreted high-level general-purpose programming language with a set of libraries. Very popular for data analysis and ML applications.

2.2.2.8.2.3. FastAPI

A popular web microframework written in Python, FastAPI is known for being both robust and high performing. It is based on OpenAPI (previously Swagger) standards.

2.2.2.8.2.4. Flower

A federated learning framework designed to work with a large number of clients. It is both compatible with a variety of ML frameworks and supports a wide range of devices.

2.2.2.9. Version control and release

Version control and release details will be provided in the next release of the documentation.

2.2.2.10. Licence

The FL Local Operations is licensed under the Apache License, Version2.0 (the “License”).

You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

2.2.2.11. Notice (dependencies)

Dependency list and licensing information will be provided before the first major release.