2.2.4. FL Local Operations enabler

[[TOC]]

2.2.4.1. Introduction 

One of key goals of FL is to assure protection of privacy of data, owned by individual stakeholders. Therefore, data is expected to be used only locally, to train local version of the shared model, and only parameters update proposals of the ML algorithm are shared with others. An inference engine can also provide inference based on the final model. Both operations (model training and model inference) involve access to private data. This means that it is crucial to “encapsulate” local processes within a single “node” (that is controlled by data owner). However, it should be noticed that the data that is being used in both FL training processes has to be in the same format, which is imposed by the ML model that is being employed. In order to carry out with all these local operations, the FL Local Operation enabler is proposed. It consists of a few different modules: a data transformation module (the module handles the process of negotiating a suitable transformation pipeline for a given data format in order to allow for the training or inference of a specific model, and will be further extended in future works), a component encapsulated in a web application and responsible for model training, a component equipped with gRPC services and responsible for model inference, as well as a privacy module providing two selected privacy mechanisms for the FL training process (adaptive differential privacy and homomorphic encryption).

It was developed as a part of an FL system along with the FL Orchestrator, FL Local Operations and FL Repository and should ideally be deployed with those enablers in order to use its full functionality (although it is possible to conduct an FL training process without using the FL Orchestrator which serves as a GUI, configuring the enabler throught the use of its dedicated REST API). It encapsulates the functionalities of a federated learning (FL) client by maintaining a connection with the FL Orchestrator (GUI and monitoring), connecting to the training initiated by the FL server (FL Training Collector), periodically providing it with local weights and obtaining new global weights, as well as downloading any necessary components from the FL Repository (database).

Beyond the classic functionality of an FL client, however, FL Local Operations also enables the local inference deployment of a selected model (that can function as a standalone container), the use of flexible configurations, basic format verification and pluggable components, as well as selected privacy mechanisms. FL Local Operations is compatible with Prometheus metric monitoring.

This enabler has reached a TRL of 6 during the execution of the ASSIST-IoT project.

2.2.4.2. Features 

Enabler embedded in each FL involved party performing local training.
The possibility of conducting such training using PyTorch and Keras libraries, additionally allowing for the detailed configuration of appropriate optimizers, schedulers and callbacks.
Verification of local data formats compatibility with data formats required by FL through the use of format, model and data transformations configurations.
Transformation of local data formats to the format required by the ML system by specifying a chain of atomic transformations along with the appropriate parameters.
Custom clients, data loaders and services can also be constructed.
Local storage of models for training process optimization purposes is allowed.
The local results are sent to the FL Training Collector in order to carry out the appropriate aggregation methodology over the common shared model.
There are multiple aggregation algorithms provided, with an added possibility to implement and include an additional one.
The inference module allows for placing models in the TFLite format on inference. Combined with the inclusion of gRPC for inference module communication, it provides a particularly lightweight inference solution.
The inference module can be deployed as a standalone.
Communication of model updates via encryption mechanisms. Homomorphic encryption will not permit outsiders to see the output model of each device/party (MITM attacks), whereas methods for creating differentially private noise will guarantee that Malicious Aggregator cannot be allowed to infer which records are actual models and which not.
Unit tests are included which can be ran using pytest.

2.2.4.3. Place in architecture 

FL Local Operations enabler is one of the Federated Learning enablers that together enable to deploy a Federated Learning environment. Functionally, it operates on scalability and manageability verticals in the Assist-IoT architecture.

2.2.4.4. User guide 

2.2.4.4.1. FL Local Operations: Training Module

Interactions with the training module of this enabler are done through a REST API. In the FL environment the FL Orchestrator enabler sends appropriate configuration to FL Local Operations that later on communicate with FL Training Collector and FL Repository if required.

The enabler exposes REST API (see endpoints below) to communicate with external enablers/applications but also uses gRPC to communicate with FL Training Collector during model training processes.

Method	Endpoint	Description
POST	/job/config/{id}	Receive configuration for a training job
POST	/model/	Receive the configuration of a new model for local storage
PUT	/model/{model_name}/ {model_version}	Receive the zipped data of a given model
GET	/status	Get current status of the enabler
GET	/capabilities	Get information about the capabilities (available dependencies, GPU etc) of the local machine
GET	/format	Get information about the format of the data available on the local machine

Additionally, the configuration of the final, Kubernetes deployment of the FL LO Training Module can be modified through the manipulation of files located in the pvc-data-lo Persistent Volume. Those files clarify the format of the local data (format.json), as well as which transformations in what order should be applied to this data (transformation_pipeline.json). Specific environmental variables, important for the deployment, such as the address of the FL Repository enabler, address of the FL Orchestrator enabler etc. may also be adjusted in a the suitable ConfigMaps.

2.2.4.4.2. FL Local Operations: Inference Module

In order to create a component capable of truly fast and lightweight inference, a choice was made to use gRPC instead of REST in order to communicate with the inference model. The resulting specification of input and output messages looks as follows: ` syntax = “proto3”;

package basic_inference;

message Tensor32 { repeated float array = 1; repeated int32 shape = 2; }

message BasicInferenceRequest{ int32 id = 1; Tensor32 tensor = 2; }

message BasicInferenceResponse{ int32 id = 1; Tensor32 tensor = 2; }

service BasicInferenceService{ rpc predict(stream BasicInferenceRequest) returns (stream BasicInferenceResponse) {} } ` Here, the BasicInferenceRequest encapsulates the unique id of the request (which will later allow to easily match the predictions with the necessary input data), as well as the data necessary for inference. In order to create as application independent (and therefore as suitable for later reuse) as possible gRPC service, the input data is specified just a series of floats with a dynamic shape. The process uses bidirectional streaming for maximal speed and flexibility.

2.2.4.5. Prerequisities 

There are three possible ways to run the FL Local Operations. The first, no longer supported mode of deployment necessitates a local installation of Python 3.8+, along with all the packages located in requirements.txt files already preinstalled. A second, much more strongly encouraged mode of deployment uses Docker and docker-compose to locally create the appropriate containers. The third and final mode of deployment relies on the inclusion of the appropriate Helm charts. In order to use this mode of deployment, the local machine needs a preinstalled version of Kubernetes.

2.2.4.5.1. Helm chart

The FL Local Operations enabler has been developed with the assumption that it will be deployed on a Kubernetes cluster with a dedicated Helm chart. To do so, just run helm install <deployment name> helm-chart. If you want to deploy multiple FL Local Operations in one Kubernetes cluster, just choose different names for all of the deployments. If you want to deploy only the inference component, run helm install <deployment name> helm-chart --set inferenceapp.fullDeployment.enabled=false.

To make sure that before that the enabler has been configured properly, check the 3 ConfigMaps that are deployed alongside the enabler. Their names change depending on the name od the deployment (to allow for multiple Local Operations instances to coexist in a Kubernetes cluster while having slightly different configurations).

The first, which name starts with flinference-config-map, serves to flexibly set and change the configuration for the inference component, including the data format received by the gRPC service (as format.json), the name, version and input format of the model (as model.json), the configuration of the data transformation pipeline (as transformation_pipeline.json) and the data about both the serialized gRPC service and the specific inferencer to be used (as setup.json).

The second config map, which name begins with fllocalops-config-map contains the environmental variables necessary to deploy the FL Local Operations instance. Check especially the fields of REPOSITORY_ADDRESS (the address of the nearest FL Repository instance), ORCHESTRATOR_SVR_ADDRESS (the address of the FL Orchestrator’s main service), ORCHESTRATOR_WS_ADDRESS (the address that the websocket should use to connect to the FL Orchestrator) and SERVER_ADDRESS (the address of the FL Training Collector). If you change something in the ConfigMap when the enabler is already deployed, destroy the inferenceapp and trainingapp pods to let them recreate with the updated configuration.

Finally, the ConfigMap beginning with fltraining-config-map describes the configuration necessary to run the trainingapp component with pluggable transformations. This includes the data format that the data loader has access to (as format.json), the input format of the model (as model.json), the configuration of the train data transformation pipeline (as transformation_pipeline_train.json) as well as test data transformation pipleine (transformation_pipeline_test.json) and the data about both the specific data loader and training client that will need to be used (as setup.json).

If you’d like to see and experiment with the API, the recommended approach is to go to the http://127.0.0.1:XXXXX/docs URL (if the NodePort for the first FL Local Operations endpoint has been changes, it should be also updated in the URL) and use the Swagger docs generated by the FastAPI framework.

2.2.4.5.2. Docker image

You can run USER_INDEX=1 FL_LOCAL_OP_DATA_FOLDER="./data" docker compose up --force-recreate --build -d in your terminal to build a new Docker image or use the start-local.sh script to do it automatically (for instance, by running the command ./start-local.sh 1).

2.2.4.6. Configuration options 

In order to initiate the training, a JSON encompassing the following configuration should be sent to the endpoint shown below. The most important available keys and their meaning will be explained further down.

POST /job/config/{training_id}/

{
  "client_type_id": "string",
  "server_address": "string",
  "eval_metrics": [
    "string"
  ],
  "eval_func": "string",
  "num_classes": 0,
  "num_rounds": 0,
  "shape": [
    0
  ],
  "training_id": 0,
  "model_name": "string",
  "model_version": "string",
  "config": [
    {
      "config_id": "string",
      "batch_size": 0,
      "steps_per_epoch": 0,
      "epochs": 0,
      "learning_rate": 0
    }
  ],
  "optimizer_config": {
    "optimizer": "string",
    "lr": 0,
    "rho": 0,
    "eps": 0,
    "foreach": true,
    "maximize": true,
    "lr_decay": 0,
    "betas": [
      "string",
      "string"
    ],
    "etas": [
      "string",
      "string"
    ],
    "step_sizes": [
      "string",
      "string"
    ],
    "lambd": 0,
    "alpha": 0,
    "t0": 0,
    "max_iter": 0,
    "max_eval": 0,
    "tolerance_grad": 0,
    "tolerance_change": 0,
    "history_size": 0,
    "line_search_fn": "string",
    "momentum_decay": 0,
    "dampening": 0,
    "centered": true,
    "nesterov": true,
    "momentum": 0,
    "weight_decay": 0,
    "amsgrad": true,
    "learning_rate": 0,
    "name": "string",
    "clipnorm": 0,
    "global_clipnorm": 0,
    "use_ema": true,
    "ema_momentum": 0,
    "ema_overwrite_frequency": 0,
    "jit_compile": true,
    "epsilon": 0,
    "clipvalue": 0,
    "initial_accumulator_value": 0,
    "beta_1": 0,
    "beta_2": 0,
    "beta_2_decay": 0,
    "epsilon_1": 0,
    "epsilon_2": 0,
    "learning_rate_power": 0,
    "l1_regularization_strength": 0,
    "l2_regularization_strength": 0,
    "l2_shrinkage_regularization_strength": 0,
    "beta": 0
  },
  "scheduler_config": {
    "scheduler": "string",
    "step_size": 0,
    "gamma": 0,
    "last_epoch": 0,
    "verbose": true,
    "milestones": [
      0
    ],
    "factor": 0,
    "total_iters": 0,
    "start_factor": 0,
    "end_factor": 0,
    "monitor": "string",
    "min_delta": 0,
    "patience": 0,
    "mode": "string",
    "baseline": 0,
    "restore_best_weights": true,
    "start_from_epoch": 0,
    "cooldown": 0,
    "min_lr": 0
  },
  "warmup_config": {
    "scheduler": "string",
    "warmup_iters": 0,
    "warmup_epochs": 0,
    "warmup_factor": 0,
    "scheduler_conf": {
      "scheduler": "string",
      "step_size": 0,
      "gamma": 0,
      "last_epoch": 0,
      "verbose": true,
      "milestones": [
        0
      ],
      "factor": 0,
      "total_iters": 0,
      "start_factor": 0,
      "end_factor": 0,
      "monitor": "string",
      "min_delta": 0,
      "patience": 0,
      "mode": "string",
      "baseline": 0,
      "restore_best_weights": true,
      "start_from_epoch": 0,
      "cooldown": 0,
      "min_lr": 0
    }
  },
  "privacy-mechanisms": {
    "homomorphic": {
      "poly_modulus_degree": 8192,
      "coeff_mod_bit_sizes": [
        60,
        40,
        40
      ],
      "scale_bits": 40,
      "scheme": "CKKS"
    },
    "dp-adaptive":{
      "num_sampled_clients": 0,
      "init_clip_norm": 0.1,
      "noise_multiplier": 1,
      "server_side_noising": true,
      "clip_count_stddev": null,
      "clip_norm_target_quantile": 0.5,
      "clip_norm_lr": 0.2
    }
  }
}

The definitions: - client_type_id Specifies the ID of the client. Allows to bypass the plugability modules for the Pytorch builder with the keyword “base” for testing purposes. - server_address The address of the Flower server that the FL client should try to connect to. - eval_metrics The evaluation metrics which will be gathered through the evaluation process by the FL client. - eval_func The evaluation function that the model will use as the loss throughout the training process. - num_classes The number of classes in classification problems. - num_rounds The number of rounds that the training should run for. - shape The shape of the data. Currently, this parameter is recommended to be changed through the ConfigMaps instead. - training_id The id of the training process being conducted. - model_name The name of the model that will be used in the training. The name should be the same as the one stored in FL Repository. - model_version The version of the model that will be used in the training. The name should be the same as the one stored in the FL Repository. - config The configuration specifying how the FL training process will be conducted on the client, containing important terms such as the batch_size or learning rate. - optimizer_config The configuration of the optimizer. - optimizer For the Keras model and client, the optimizer can be one of: python "sgd": tf.keras.optimizers.SGD, "rmsprop": tf.keras.optimizers.RMSprop, "adam": tf.keras.optimizers.Adam, "adadelta": tf.keras.optimizers.Adadelta, "adagrad": tf.keras.optimizers.Adagrad, "adamax": tf.keras.optimizers.Adamax, "nadam": tf.keras.optimizers.Nadam, "ftrl": tf.keras.optimizers.Ftrl For the PyTorch model and client, the optimizer can be one of: python "adadelta": torch.optim.Adadelta, "adagrad": torch.optim.Adagrad, "adam": torch.optim.Adam, "adamw": torch.optim.AdamW, "sparseadam": torch.optim.SparseAdam, "adamax": torch.optim.Adamax, "asgd": torch.optim.ASGD, "lbfgs": torch.optim.LBFGS, "nadam": torch.optim.NAdam, "radam": torch.optim.RAdam, "rmsprop": torch.optim.RMSprop, "rprop": torch.optim.Rprop, "sgd": torch.optim.SGD Other fields indicate the arguments that should be passed to the optimizer. - scheduler_config The configuration of the scheduler. - scheduler For the Keras model and client, the scheduler (or here, a more appropriate name would be a Keras callback) can be one of: python "earlystopping": tf.keras.callbacks.EarlyStopping, "reducelronplateau": tf.keras.callbacks.ReduceLROnPlateau, "terminateonnan": tf.keras.callbacks.TerminateOnNaN For the Pytorch model and client, the scheduler can be one of: python "lambdalr": torch.optim.lr_scheduler.LambdaLR, "multiplicativelr": torch.optim.lr_scheduler.MultiplicativeLR, "steplr": torch.optim.lr_scheduler.StepLR, "multisteplr": torch.optim.lr_scheduler.MultiStepLR, "constantlr": torch.optim.lr_scheduler.ConstantLR, "linearlr": torch.optim.lr_scheduler.LinearLR, "exponentiallr": torch.optim.lr_scheduler.ExponentialLR, "cosineannealinglr": torch.optim.lr_scheduler.CosineAnnealingLR, "chainedscheduler": torch.optim.lr_scheduler.ChainedScheduler, "sequentiallr": torch.optim.lr_scheduler.SequentialLR, "reducelronplateau": torch.optim.lr_scheduler.ReduceLROnPlateau, "cycliclr": torch.optim.lr_scheduler.CyclicLR, "onecyclelr": torch.optim.lr_scheduler.OneCycleLR, "cosineannealingwarmrestarts": torch.optim.lr_scheduler.CosineAnnealingWarmRestarts Other fields indicate the arguments that should be passed to the scheduler. - warmup_config The configuration of an (optional) warmup. This configuration is valid only for the PyTorch builder. It specifies a special scheduler, which can be used only for a selected number of epochs to provide warmup throughout the process. - scheduler The name of the scheduler. Other fields indicate the arguments that should be passed to the scheduler.

privacy-mechanisms The configuration indicating which privacy mechanisms should the FL Training Collector employ (if any) and what should be their parameters. This dictionary can have no keys (which indicates no privacy mechanisms used), “homomorphic” which indicates the use of HE, “dp-adaptive” which indicates the use of Differential Privacy with Adaptive Clipping or both “homomorphic” and “dp-adaptive”, which indicates that both techniques should be used.
- homomorphic The parametres configurable to be used for homomorphically encrypted federated averaging are used to specify the context as described in the TenSEAL documentation.
- dp-adaptive The parametres specifying the differentially private Federated Averaging are taken from the Flower library and, by proxy, from the relevant paper.

A sample test configuration can be seen here:

{"client_type_id": "local1",
  "server_address": "trainingcollectorlocal-trainingmain-svc2",
  "eval_metrics": [
    "accuracy"
  ],
  "eval_func": "categorical_crossentropy",
  "num_classes": 10,
  "num_rounds": 15,
  "shape": [
    32, 32, 3
  ],
  "training_id": "10",
  "model_name": "keras_test",
  "model_version": "version_1",
  "config": [
    {"config_id": "min_effort",
   "batch_size": "64",
   "steps_per_epoch": "32",
   "epochs": "1",
   "learning_rate": "0.001"}
  ],
  "optimizer_config": {
    "optimizer": "adam",
    "learning_rate":"0.005",
    "amsgrad":"True"
  },
  "scheduler_config": {
    "scheduler": "reducelronplateau",
    "factor":"0.5",
    "min_delta":"0.0003"
  },
  "privacy-mechanisms":{}}

2.2.4.7. Developer guide 

The Local Model Training component is responsible for local model training. During configuration it instantiates appropriate ML training libraries and, if this is the beginning of the process, initial version of the shared model. This step can be completed locally by the node owner, but this is unlikely. The main priority lies in assuring uniformity of training methods across nodes belonging to different owner. The necessary modules (ML algorithm libraries and the initial version of the shared model) will be downloaded from the FL Repository.

A websocket client is running in the background of the trainingapp pod. Its purpose is to provide a continuous means of communication with the FL Orchestrator, so that the Orchestrator knows exactly which FL Local Operations are active and can participate in training. It will try to connect with the FL Orchestrator server via the ORCHESTRATOR_WS_ADDRESS address configured in the fllocalops-config-map ConfigMap. To appropriately change it is then enough to modify this address with kubectl edit cm and recreate the trainingapp pod.

The inference component corresponds to the inferenceapp pod and can function as a standalone. It uses gRPC for lightweight communication. It allows for the configuration setup through the modification of configuration files located in the configurations directory (which can also be modified on the fly by changing the values in the flinference-config-map and restarting the pod), as well as the addition and subtraction of serialized objects from the (they can be accessed and changed as a Kubernetes volume or downloaded on the fly from the FL Repository in the case of data transformations and models). By default, the inference component accepts data in the form of numerical arrays of any shape and uses a TFLite model to provide lightweight and fast inference. However, it is possible to change the input shape and further details with the use of pluggability.

The inference component is, by default, installed with the rest of the Helm chart. Then it can be accessed through service fllocaloperationslocal-inferenceapp on port 50051 according to the specification located in inference_application/code/proto/basic-inference.proto.

In IoT ecosystems, each partner may (and is likely to) store data in its own (private/local) format. Use of FL requires transformation of appropriate parts of local data into the correct format. This format has to be described as part of the FL configuration, and all participating nodes have to oblige. This may be achieved by node owner providing a set of the appropriate transformation components, that applied in a certain order may allow for data format unification. However, such components have to be flexibly downloaded from the FL Repository enabler.

There are two privacy mechanisms implemented to be used by the FL System. The FL Training Collector can be configured to work with either of them, both or none of them through the use of the training configuration.

The mechanism of Adaptive Differential Privacy modifies the selected strategy by introducing noise to the local model parameters before they are sent by the client. This increases the privacy of the data on the client by obfuscating the information about its distribution. This specific implementation additionally uses adaptive clipping to reduce the balance the influence of multiple clients. The use of this privacy technique may lead to a degradation in the performance of the final model, but introduces little to none additional, computational cost.

The use of adaptive differential privacy and its specific parameters can be specified in the training configuration under the privacy_mechanisms keyword. If we include dp-adaptive in this dictionary, we can specify the parameters used by the Flower implementation under the dp-adaptive key and configure the training like this:

"privacy-mechanisms":{
  "dp-adaptive":{
    "num_sampled_clients":"1"
  }
}

The mechanism of Federated Averaging with Homomorphic Encryption has been implemented from scratch using the TenSEAL library. As Homomorphic Encryption allows for the encryption of numbers such that the decrypted sum of encrypted numbers is the same as the sum of encrypted numbers (and similarly for the subtraction and multiplication). It therefore allows the FL clients to send their encrypted weights, which can then be aggregated and return as the averaged weights in the encrypted form. This ensures that in the event of a malicious server (or a malicious eavesdropper) the privacy of the clients’ data remains intact.

The current implementation encrypts the parameters as a CCKS tensor (as implemented in TenSEAL), so if the user would like to generate and serialize new keys and contexts, they should be compatible with this method.

In order to generate a new set of keys, you can use the file application/generate_homomorphic_keys.py. If a new set of keys is generated, the application/src/custom_clients/hm_keys/public.text and application/src/custom_clients/hm_keys/secret.text files should be appropriately changed (and potentially modified to be a Kubernetes secret).

Attention: As an extremely computationally expensive method, it can usually be used only for the simplest of methods and datasets. Therefore it is not recommended in this implementation to use it for models more complicated than a simple Linear Regression.

2.2.4.7.1. Pluggable modules

The trainingapp component suports FL training with the use of Keras and Pytorch libraries out of the box. Similarly, the inferenceapp component supports the inference with the TFLite inferencer. However, it is possible to develop custom components for: - in the case of trainingapp: - FL client - FL model - FL data loader - FL data transformations - in the case of inferenceapp: - gRPC service along with the proto and protocompiled files - inferencer - model.

In order to deploy the image with your custom components through the use of Kubernetes volume, change the custom_setup field in values.yaml to True. ### Tutorial

2.2.4.7.1.1. Model

Uploading a new FL model.

A new FL model can be saved either in a format ready for FL Training in Keras, Pytorch, or FL inference in TFLite. For Keras, the method used should be:

model.save('model')

Then, the file should be compressed to a ZIP format in order to save space, for example using this snippet of code:

with zipfile.ZipFile('keras_test_model.zip', 'w') as f:
    upper_dir = pathlib.Path("model/")
    for file in upper_dir.rglob("*"):
        f.write(file)

A model for the Pytorch pipeline should be saved using this snippet (preferably with the same file name):

torch.jit.save(m, 'scripted_model,pt')

And a model for TFLite inference needs to be served in a TFLite format. Both Pytorch and TFLite models should be compressed before uploading them to the FL Repository, similarly to the Keras model. They can be uploaded using the Swagger API of the FL Repository, by first creating the metadata of the model and then uploading a file by updating the object for a given metadata.

Loading the weights from the training results for a given FL model.

I will demonstrate it on a toy Keras example.

First, we should download from the FL Repository the underlying model (if you have saved the model elsewhere the step can be skipped).

with requests.get(f"http://{REPOSITORY_URL}/model"
                              f"/keras_test/version_1",
                              stream=True) as r:
    with open(f'temp.zip', 'wb') as f:
            shutil.copyfileobj(r.raw, f)
with ZipFile(f'temp.zip', 'r') as zipObj:
    # Extract all the contents of zip file in current directory
    zipObj.extractall(f'temp')

Then, the model has to be loaded. In order to deal with different levels of nesting from the downloaded ZIP files, I’m using a helpful script:

def check_loading_path(temp):
    '''Checks how nested was the zipped file in order to load it correctly'''
    nested_files = os.listdir(temp)
    if len(nested_files) == 1 and os.path.isdir(os.path.join(temp, nested_files[0])):
        return check_loading_path(os.path.join(temp, nested_files[0]))
    else:
        return temp

load_path = check_loading_path('temp')
model = keras.models.load_model(load_path)

Finally, we have to download the selected weights. Attention: Make sure that the Python version of the environment you’re loading the pickle file in is compatible with the FL Training Collector, which means Python 3.8.3.

with requests.get(f"http://{REPOSITORY_URL}/training-results/weights"
                              f"/keras_test/version_1/13",
                              stream=True) as r:
    with open(f'temp2.pkl', 'wb') as f:
            shutil.copyfileobj(r.raw, f)
with open('temp2.pkl', 'rb') as pickle_file:
    weights = pickle.load(pickle_file)

After unpickling we’re ready to set the weights.

model.set_weights(weights)

2.2.4.7.1.2. Data transformation

We will be demonstrating how to construct and configure the loading of a data transformation for the inference module.

Attention: To do so, first make sure that the environment you’re using has a Python version compatible with the inference module, that is, 3.11.4. Otherwise, you may encounter problems related to pickle magic numbers.

First, let’s design the transformation. Here is a sample data transformation:

from data_transformation.transformation import DataTransformation
from datamodels.models import MachineCapabilities


import numpy as np

class BasicDimensionExpansionTransformation(DataTransformation):
    import numpy as np

    id = "basic-expand-dimensions"
    description = """Basically a wrapper around numpy.expand_dims.
    Expands the shape of the array by inserting a new axis, that will appear at the axis position in expanded array shape"""
    parameter_types = {"axis": int}
    default_values = {"axis": 0}
    outputs = [np.ndarray]
    needs = MachineCapabilities(preinstalled_libraries={"numpy": "1.23.5"})

    def set_parameters(self, parameters):
        self.params = parameters

    def get_parameters(self):
        return self.params

    def transform_data(self, data):
        data = np.array(data)
        return np.expand_dims(data, axis=self.params["axis"])

    def transform_format(self, format):
        if "numerical" in format["data_types"]:
            axis = self.params["axis"]
            format["data_types"]["numerical"]["size"].insert(axis, 1)
        return format

The new data transformation class should be a subclass of the abstract DataTransformation class from the data_transformation module. It should have a unique id, a description of purpose, a dictionary of parameter types, a dictionary of default values, a list of output types and a MachineCapabilities object that expresses what needs to be present in the Docker container/on the machine to run this transformation.

If you have this transformation ready, you should put it, for example, in the inference_application/custom directory in the FL Local Operations repository and use a different file to properly serialize the modules. Like this:

with zipfile.PyZipFile("inference_application.custom.expansion.zip", mode="w") as zip_pkg:
     zip_pkg.writepy("inference_application/custom/expansion.py")

with open('inference_application.custom.expansion.pkl', 'wb') as f:
    dill.dump(BasicDimensionExpansionTransformation,f)

For serializing this data, both zipimport and dill were used to make sure that even the most complicated transformations will be possible to load. Just remember to name the files according to the paths to the modules you would like to serialize (just replace the “/” with the “.”).

Then, you can either zip the two resulting files and upload them to the FL Repository as a transformation, or place the files in the inference_application/local_cache/transformations directory, either by building a new image or deploying the Helm chart with the customSetup field marked to true in values.yaml file for the inference application and using kubectl cp to place the files.

Finally, you can apprioprately change the inference application configuration to use that specific transformation with selected parameters. You can do it by modifying the appropriate ConfigMap.

[
    {
        "id": "inference_application.custom.basic_norm",
        "parameters": {

        }
    },
    {
        "id": "inference_application.custom.expansion",
        "parameters": {
            "axis": 0

        }
    }
]

If you just want to reuse an existing transformation, it’s enough to only modify the configuration. The serialization of data transformations for the FL Training Collector is very similar, but necessitates the use of Python 3.8.3

For the extended documentation on how to develop other pluggable modules based on some examples, please send me an email

2.2.4.7.1.2.1. Technologies

2.2.4.7.1.3. scikit-learn

A popular machine learning library often used for data preprocessing and transformation, for example encoding labels. It is open source and widely used in the industry.

2.2.4.7.1.4. pyTorch

An open source machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook’s AI Research lab (FAIR).

2.2.4.7.1.5. Python

Python is an interpreted high-level general-purpose programming language with a set of libraries. Very popular for data analysis and ML applications.

2.2.4.7.1.6. TensorFlow

A free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.

2.2.4.7.1.7. Tensorflow Lite

A mobile library allowing for easy, lightweight deployment of ML models on mobile, microcontrollers and edge device. It employs, for example, quantization in order to decrease the resources consumed by the model during inference.

2.2.4.7.1.8. Flower

A federated learning framework designed to work with a large number of clients. It is both compatible with a variety of ML frameworks and supports a wide range of devices.

2.2.4.7.1.9. OpenCV

A real-time computer vision library providing already optimized models. It is cross-platform and open-source.

2.2.4.7.1.10. TenSEAL

A library that empowers users to easily conduct Homomorphic Encryption operations on tensors, built on top of Microsoft SEAL. Since the underlying implementation uses C++, the resulting methods consume as little resources as possible.

2.2.4.7.1.11. gRPC

A modern open source, high performance Remote Procedure Call (RPC) framework. gRPC works across many languages and platforms, is exceptionally efficient and scalable.

2.2.4.7.1.12. FastAPI

A popular web microframework written in Python, FastAPI is known for being both robust and high performing. It is based on OpenAPI (previously Swagger) standards.

2.2.4.8. Prometheus metric monitoring 

The Prometheus metrics are available for scraping on the the port 9050 under url /metrics on the trainingapp, and on the port 9000 without any additional url path changes in the inferenceapp.

2.2.4.9. Licence 

The FL Local Operations is released under the Apache 2.0 license, as we have internally concluded that we are not “offering the functionality of MongoDB, or modified versions of MongoDB, to third parties as a service”. However, potential future commercial adopters should be aware that our project uses MongoDB in order to be able to accurately determine the license most applicable to their projects.

You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

2.2.4.10. Authors 

Karolina Bogacka
Piotr Sowiński
Jose Antonio Clemente Perez

2.2.4.11. Notice (dependencies)

The information about the dependencies needed to run a specific part of the application can be found described in the appropriate requirements.txt files located. However, since they are downloaded automatically during the construction of the appropriate Docker images, the local dependencies needed to deploy the application include only a local Docker along with Docker Compose or Kubernetes installation.