1.3.1. Semantic Repository enabler

1.3.1.1. Introduction

This enabler offers a “nexus” for data models, ontologies, and other files, that can be uploaded in different file formats, and served to users with relevant documentation. This enabler is aimed to support files that describe data models or support data transformations, such as ontologies, schema files, semantic alignment files etc. However, there are no restrictions on file format and size.

Overall focus of the Semantic Repository’s design is high performance, scalability, and resiliency. It should be able to scale up and down to meet the specific use case.

1.3.1.2. Features

The enabler is in active development. Most features listed below are not implemented yet. Marked in bold are those that are already functioning.

1.3.1.2.1. Storing data models

  • Storage of any type of data model, both textual and binary.

  • Ability to provide multiple formats of one data model, depending on the requester’s preferences.

  • Grouping data models into namespaces.

  • Flexible versioning with arbitrary tag names.

  • Granular and easy-to-use access control.

1.3.1.2.2. Metadata

  • Tracking provenance information (creation/modification dates, authors).

  • Ability to attach arbitrary additional metadata.

  • Metadata searching and sorting.

1.3.1.2.3. Documentation

  • Support for Markdown/ASCIIDOC manual documentation pages.

  • Automatic documentation generation for some data model types.

  • Flexible plugin architecture for creating additional documentation generation modules.

1.3.1.3. Place in architecture

The Semantic Repository is located in the Data management plane of the ASSIST-IoT architecture. It serves as a versioned and namespaced central repository of data models and other files. It has few limitations with regard to the content it can store, thus it can be used for diverse data storage-related scenarios.

1.3.1.4. User guide

The Semantic Repository enabler exposes a single REST API endpoint for both manipulating the repository’s contents, as well as for retrieving stored data models. There is also a graphical user interface for performing most of the same tasks.

1.3.1.4.1. General Information

1.3.1.4.1.1. Basic concepts

  • Namespace – a top-level “group” in the repository, which can host any number of models.

  • Model – a data model, which can have many versions.

  • Model version – a specific version of a model. You can upload the content of a data model only to its specific version. The version can also have associated documentation pages and other metadata.

  • Content – each model version can have many content files attached, each in a different format.

There are few restrictions on how you can use these concepts to build your repository. For example, it is possible to upload files of arbitrary size and format.

To give some context, in GitHub terms, a namespace would translate a user or a group. A model would be a repository, and a model version would be a branch or tag. This is just an example, of course.

1.3.1.4.1.2. Model versions

The Semantic Repository does not force a specific versioning scheme on your models. You can use for example Git branches and tags, plain numbers, or Semantic Versioning.

The latest version tag is special – it is a pointer to the most recent version of the model, as set by the model’s owner. It must always be set manually. A model may have no latest pointer, and the pointer may lead to a non-existent version. Enforcing a specific style of use is up to the owner.

The benefit of the latest tag is that it allows clients to easily retrieve the most recent version of the model (see the API user guide).

1.3.1.4.1.3. Content

One model version can have multiple content files attached, each in a different format. The format is recommended to correspond to the Media Type of the file – this is to best support HTTP-based technologies, such as Linked Data. However, you can always set the format to whatever you like.

The content for one model version should be immutable, i.e., you should avoid modifying the once-uploaded content for a specific version. This is so that clients can expect that the content for a given version will not change suddenly, introducing a backward-incompatible change. It is however possible to overwrite earlier-uploaded content, in case of a mistake, for example. See the API guide below for more details.

You can specify the default format to use when retrieving the content, when no preferences were specified. See the API guide below for more details.

1.3.1.4.1.4. Metadata

Not implemented yet.

1.3.1.4.2. REST API

The following is a brief guide to using the API in practice. The examples follow a basic use case of storing several W3C ontologies.

The full specification of the REST API can be found in the REST API reference section.

1.3.1.4.2.1. General information

The API follows a very simple structure of /{namespace}/{model}/{model_version}. In general, POST creates a new thing at the given URL, GET retrieves it, DELETE deletes it, and PATCH modifies it.

The API only returns responses in plain JSON. The following guide should give you a good idea of what the responses look like, but you can also find the full schemas in the REST API reference section.

It generally does not matter whether a URL ends with a slash or not.

1.3.1.4.2.2. Creating and retrieving models

Step 1: create a namespace

First, we will need to create a namespace for your models. We will name it w3c.

Request URL

Request body

POST /w3c

(empty)

Response code

Response body

200

{"message": "Created namespace 'w3c'."}

You can examine the created namespace by performing an HTTP GET request:

Request URL

Request body

GET /w3c

Response code

Response body

200

{"name": "w3c"}

Currently, there is no other information in the namespace other than its name.

You can also list all namespaces in the repository:

Request URL

Request body

GET /

Response:

{
  "items": [{"name": "w3c"}],
  "totalCount": 1
}

A collection of namespaces is returned. Browsing such collections is described in detail in the Browsing collections section below.

Note: namespace name must meet the following criteria: - be at least 3 characters, and at most 100 characters long - only contain lower or upper letters of the latin alphabet, digits, dashes (-), and underscores (_)

Step 2: create models

In this example we will create two models: sosa and ssn, corresponding to two well-known IoT ontologies. Creating a model is similar to creating a namespace:

Request

Body

POST /w3c/ssn

(empty)

Response code

Body

200

{"message": "Created model 'w3c/ssn'."}

and for sosa:

Request

Body

POST /w3c/sosa

(empty)

Response code

Body

200

{"message": "Created model 'w3c/sosa'."}

You can examine the created model:

Request

Body

GET /w3c/sosa

Response code

Body

200

{"namespace": "w3c", "name": "sosa"}

When you again examine the contents of the namespace (GET /w3c), you will see a collection of models:

{
  "models": {
    "items": [
      {
        "name": "sosa",
        "namespace": "w3c"
      },
      {
        "name": "ssn",
        "namespace": "w3c"
      }
    ],
    "totalCount": 2
  },
  "name": "w3c"
}

Note: model names must meet the following criteria: - be at least 1 and at most 100 characters long - only contain lower or upper letters of the latin alphabet, digits, dashes (-), and underscores (_) - not start with one of the following characters: _-

Step 3: create versions

You cannot upload content to a model directly. First, you must explicitly create a specific version of the model and work with that.

For example, to create a version 1.0 of model sosa:

Request

Body

POST /w3c/sosa/1.0

(empty)

Response code

Body

200

{"message": "Created model version 'w3c/sosa/1.0'."}

You can examine the content of this version:

Request

Body

GET /w3c/sosa/1.0

Response:

{
  "formats": {},
  "model": "sosa",
  "namespace": "w3c",
  "version": "1.0"
}

You can also retrieve a list of versions for the model (again, GET /w3c/sosa):

{
  "name": "sosa",
  "namespace": "w3c",
  "versions": {
    "items": [
      {
        "model": "sosa",
        "namespace": "w3c",
        "version": "1.0"
      }
    ],
    "totalCount": 1
  }
}

Note: version tags must meet the following criteria: - be at least 1 and at most 100 characters long - only contain lower or upper letters of the latin alphabet, digits, dashes (-), underscores (_), dots (.), and plus signs (+) - not start with one of the following characters: ._-+ - not be latest, which is a reserved tag (see below)

latest pointer

The latest version pointer can be set on a given model using a PATCH request:

Request

Body

PATCH /w3c/sosa

{"latestVersion": "1.0"}

Response code

Body

200

{"message": "Updated model 'w3c/sosa'."}

Now it can be used in GET requests instead of the explicit version. So, GET /w3c/sosa/latest is equivalent to GET /w3c/sosa/1.0.

Important: to prevent accidental overwrites, it is not possible to make POST, PATCH, or DELETE requests via the ``latest`` pointer. Use the explicit version in the URL instead.

The version pointer can also be set during model creation:

Request

Body

POST /w3c/ssn

{"latestVersion": "1.0"}

Response code

Body

200

{"message": "Created model 'w3c/ssn'."}

To change the pointer to a new value, simply make a PATCH request. To unset the pointer completely, use the special @unset value in a PATCH request:

Request

Body

PATCH /w3c/sosa

{"latestVersion": "@unset"}

Response code

Body

200

{"message": "Updated model 'w3c/sosa'."}

1.3.1.4.2.3. Uploading content

In the following examples we will focus on uploading and retrieving content for the /w3c/sosa/1.0 model version we have created in the previous section.

To upload content in format text/turtle:

Request

Body

POST /w3c/sosa/1.0/content?format=text/turtle

content: (file)

In the body of the request (form-data) set the field content to the file you want to upload.

In response you will get:

{
    "message": "Uploaded content in format 'text/turtle' for model 'w3c/sosa/1.0'. Checksum: 5b844292b8402e448804f9c9f100d59e",
    "warnings": [
        "The default format of this model version was set to 'text/turtle'.'"
    ]
}

The response notes that the default format of the model version was set to “text/turtle” because that is the first format we have uploaded. You can upload more content files for the model version in a similar manner.

The Semantic Repository support multipart, streaming uploads and can handle files of any size this way.

To see the available formats, make a GET /w3c/sosa/1.0 request:

{
  "defaultFormat": "text/turtle",
  "formats": {
    "text/turtle": {
      "contentType": "text/turtle",
      "md5": "5b844292b8402e448804f9c9f100d59e",
      "size": 27326
    }
  },
  "model": "sosa",
  "namespace": "w3c",
  "version": "1.0"
}

In the response notice that: - defaultFormat has been set to “text/turtle”. You can change that later. - formats is keyed by format name. - contentType displays the content type of the uploaded file, which in this case is the same as format. - md5 is the MD5 checksum of the entire file. - size is the file’s size in bytes.

1.3.1.4.2.4. Overwriting content

As noted in the User guide, the content for a specific version of a model should be immutable. So, if you try to repeat the request presented above, it will be rejected with an HTTP 400 error:

{
  "error": "Content in format 'text/turtle' already exists for this model version. If you want to update it, it is recommended to create a new version instead. If you really want to overwrite this content, retry the upload with the 'overwrite=1' query parameter."
}

If you really want to overwrite this content (in case of a mistake, for example), add the overwrite=1 parameter:

1.3.1.4.2.4.1. Request Body

Response:

{
  "message": "Uploaded content in format 'text/turtle' for model 'w3c/sosa/1.0'. Checksum: 5b844292b8402e448804f9c9f100d59e",
  "warnings": [
    "Overwrote an earlier version of the content."
  ]
}

1.3.1.4.2.5. Changing the default format

The defaultFormat field of a model version indicates which content format will be used, if no other preferences are specified. It is set automatically to the first content format that is uploaded to the model version, but can also be changed later.

Changing the defaultFormat field is done with a PATCH request:

Request

Body

PATCH /w3c/sosa/1.0

{"defaultFormat": "application/json+ld"}

Response code

Body

200

{"message": "Updated model version 'w3c/sosa/1.0.0'."}

Now when you request GET /w3c/sosa/1.0/content (or any of the equivalent forms shown above), the Repository will attempt to retrieve content in the application/json+ld format.

Note that the Semantic Repository does not check whether the set default format is actually present in the model version. In case it is not, you will receive a 404 error when trying to retrieve the content.

The default format can also be set during model version creation:

Request

Body

POST /w3c/ssn/1.0

{"defaultFormat": "application/json+ld"}

Response code

Body

200

{"message": "Created model version 'w3c/ssn/1.0'."}

If you set the default format during model version creation, the first uploaded content will not overwrite this setting.

To change the default format to a new value, simply make a PATCH request. To unset the default format completely, use the special @unset value in a PATCH request:

Request

Body

PATCH /w3c/sosa/1.0

{"defaultFormat": "@unset"}

Response code

Body

200

{"message": "Updated model version 'w3c/sosa/1.0'."}

1.3.1.4.2.6. Downloading the content

Downloading the models is very straightforward. The most explicit way is to specify the namespace, model, version, and the desired format:

GET /w3c/sosa/1.0/content?format=text/turtle

You can also omit the format parameter to obtain the content in the default format:

GET /w3c/sosa/1.0/content

If you have set the latest tag for this model, you can use it instead of the explicit version, to fetch the most recent version of the model.

There is also a second, shorter style of URLs for downloading content, with the /c prefix:

  1. GET /c/w3c/sosa/1.0/text/turtle

  2. GET /c/w3c/sosa/latest/text/turtle

  3. GET /c/w3c/sosa/1.0

  4. GET /c/w3c/sosa/latest

  5. GET /c/w3c/sosa

Assuming that the latest tag is set to version 1.0 and the default format is text/turtle, all of the above requests will return the same result. Request 5 is simply a shorthand for “the latest version of this model, in the default format”, which should be sufficient for most applications.

In all cases the response will be simply the stored file, with the appropriate Content-Type header.

1.3.1.4.2.7. Deleting models and other objects

Only partially implemented in this version. Will be implemented in the next release.

1.3.1.4.2.8. Browsing collections

Will be determined after the release of the enabler.

1.3.1.4.2.9. Meta endpoints

Will be determined after the release of the enabler.

1.3.1.4.3. Graphical User Interface

The GUI of the Semantic Repository is under development.

1.3.1.5. REST API reference

The REST API reference can be accessed through the following link:

1.3.1.6. Prerequisites

There are currently no prerequisites for installing this enabler.

1.3.1.7. Installation

Will be determined after the release of the enabler.

1.3.1.8. Configuration

This enabler currently does not have any configuration settings. They will be added later.

1.3.1.9. Developer guide

The Semantic Repository is written in Scala 3, using the Akka framework. The information about the managed objects is stored in MongoDB and the files are stored in MinIO (S3-compatible storage).

Semantic Repository’s architecture (note that it is not fully implemented yet):

Enabler architecture

Enabler architecture

1.3.1.10. Version control and release

Version 0.1. Under development.

1.3.1.11. License

The Semantic Repository is licensed under the Apache License, Version 2.0 (the “License”).

You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

1.3.1.12. Notice (dependencies)

Dependency list and licensing information will be provided before the first major release.