#####################
Semantic Translator
#####################
.. contents::
:local:
:depth: 2
Introduction
============
Semantic interoperability is the ability of different applications and
business partners to exchange data with unambiguous, shared meaning. As
a result, data analysis and knowledge discovery can be done on a
federation of systems.
The aerOS **Semantic Translator** (*ST*) component enables semantic
translation of RDF data (messages). At its core *ST* builds on a considerably
enhanced version of the `INTER-IoT `__
Inter-Platform Semantic Mediator (IPSM) and `ASSIST-IoT `__
Semantic Translation Enabler.
The translation performed by *ST* is based on `alignments <#my-ref-alignments>`_ and uses a
deployment-specific modularized `central ontology <#my-ref-central-ontology>`_.
In the IoT domain, for example, the core modules describing, e.g. devices, observations can be based on
`GOIoTP `__ (Generic Ontology of IoT Platforms) that is a meta-data
reference data model. Additionally, any domain specific module can be
included, e.g., medical ontology, logistic ontology. However, central
ontology can be any ontology that can serve as a central data model. It
is not directly used to configure *ST* but should be considered when defining
alignments that are used for *ST* configuration. Additionally, any domain
specific module can be included e.g. medical ontology, logistic
ontology.
Concepts and features
=====================
.. _my-ref-alignments:
Alignments
----------
To define (and consequently perform) semantic translation, *ST* needs the concept of *alignment*.
The alignment is a set of correspondences between simple entities or
complex structure from source and target ontologies. It contains rules
for transformation between input and output RDF graphs. Specifically, *ST*
translates RDF graph named *payload* (that is part of the message send
to *ST*). Alignments are parts of *ST* instance configuration, and are used
directly to execute translation. Semantic translation always constitutes
application of two alignments - one from source ontology to central
ontology, the other from central ontology to target ontology.
The following figure shows a sample situation with four artifacts
(P1-4). Each with it’s own ontology (O1-4). The central ontology
contains modules g1…gn. The two-way communication requires preparation
of two alignments: (i) from artifact’s ontology to central ontology
e.g. A1G, (ii) from central ontology to artifact’s ontology e.g. AG1.
Each alignments contains correspondences between ontology modules that
are required for this part of communication.
|Overview of translation with central modularized ontology|
*Overview of translation with central modularized ontology*
To achieve semantic interoperability between two artifacts:
1. instantiate modular central ontology;
2. select/define artifact’s ontologies: create from scratch or use one
of existing tools;
3. align semantics between ontology of each artifact and central
ontology (set of alignments);
4. implement syntactic translators;
5. configure *ST* — upload alignments and create translation channels.
The following figure shows a process of sending a message from source to
target artifact that needs to be semantically and syntactically
translated.
|Process overview|
*Process overview*
The message originates at source artifact in it’s format and semantics
e.g. XML message with respect to XML Schema. To use *ST* the message needs
to be transformed to RDF with source artifact’s semantics. This
stage is called “syntactic translation”.
In fact, when *ST* is to be used in a standalone mode (without other aerOS tools),
syntactic translation can be implemented via an arbitrary component that will
“prepare” input for semantic translation.
Note that, conversion to RDF may not be necessary, when artifacts
already have communication based on RDF. When a source artifact does not
support semantics, an RDF represantation of data exchanged with the
ecosystem needs to be proposed.
When the message arrives at ST, the RDF named graph *payload* is
translated with respect to configuration of the semantic translation
channel (cf. `architecture overview <#my-ref-architecture>`_). Usually two alignments
are applied, however *ST* can be configured with special predefined
IDENTITY alignment that does not change the graph. Another remark here,
is that *ST* follows the rule: translate only what can be translated,
leave the rest as it was. The resultant message is expressed in RDF with
semantics corresponding to target semantic of last applied alignment.
This message can be feed into another syntactic translator that will
transform it’s format to e.g. JSON cosidering target artifact’s
semantics. Another possible scenario is that there are applications
consuming data in RDF and central semantics. In such case, second
syntactic translation is not necessary.
.. |Overview of translation with central modularized ontology| image:: images/translator/platforms.png
:scale: 70%
.. |Process overview| image:: images/translator/process.png
:width: 70%
.. _my-ref-central-ontology:
Central ontology
----------------
Central ontology is not directly used by *ST*, but it needs to be established for a given *ST* deployment,
to enable construction of consistent alignments. It should be modularized, so that alignments can be created to and
from selected modules, e.g. meteorological, logistic events, depending on the context of “conversation”.
Specifically, there is no need to align the whole data models if an artifact that needs to be connected to the ecosystem
exchanges only messages related to a specific “spect” e.g. meteorological observation data.
For IoT-centric applications the central ontology can be based on `GOIoTP `__ ontology, extended
by appropriate domain specific modules.
|GOIoTP and GOIoTPex modules|
*GOIoTP and GOIoTPex – IoT-dedicated central ontology modules*
In general case, central ontology can be any aribitrary ontology, since this does not influence the semantic translation engine provided by ST. However, semantic engineer preparing the deployment should keep in mind that:
- central ontology should cover all “topics” of conversations in platforms ecosystem
- it should be clear enough to enable querying and reasoning done directly on it
- it should contain subject-specific modules that can be independently maintained and versioned (for easier change management)
.. |GOIoTP and GOIoTPex modules| image:: images/translator/giotp.png
:width: 50%
Translation process
-------------------
*ST* is a component for performing semantic translation that can be use in a standalone mode or in combination with other aerOS components and application-specific tools.
It has a REST interface for configuration, and both publish-subscribe and REST interfaces for translation.
Supported message broker infrastructures that follow publish-subscribe paradigm are *Apache Kafka* and *MQTT*.
Configuration includes:
1. Uploading alignment files that define the translation rules
2. Defining semantic translation channels - each channel is defined with input and output topic names, identifiers (name and version) of input and output alignments
Additionally, the following operations are possible:
1. List all uploaded alignments
2. Delete alignment identified by name and version
3. Retrieve alignment identified by name and version
4. List created translation channels
5. Delete channel identified by id
Performing semantic translation means sending input RDF message, and receiving output RDF message. Client can publish message to input topic of semantic translation channel, and consume message from output topic of semantic translation channel.
Another possibility is to use REST API for performing a synchronous semantic translation. Here, client in a request sends input RDF graph, and a sequence of alignments that should be applied. The response contains translated RDF graph.
.. _my-ref-architecture:
|Architecture|
*Semantic Translator – general architecture*
Remarks
^^^^^^^
*ST* can perform translation between any pair of RDF graphs. The translation can be direct or composed from multiple alignments depending on the context.
In a typical IoT platforms integration case, we use Central Ontology (CO) so that the translation has two steps: translate from source semantics to CO and translate from CO to target ontology.
As a result, integration of a new platform into the ecosystem means preparing alignments to and/or from CO (depends on requirement for one- or bi-directional communication requirement).
Direct translation means that only one alignment is applied that defines mappings between source and target semantics. To configure *ST* to act in this way a translation channel should be defined that has as input alignment defined alignment and as output IDENTITY alignment (not changing the RDF graph).
Direct translation with REST API means specifying a sequence of alignments to be applied consisting of only one alignment.
Composed translation means applying more than one alignment. In pub-sub approach, by default two alignments are used. In REST-based translation an arbitrary alignments sequence can be specified.
Note that in each case, the translation process should be handled by one semantic translation channel. By default, a translation channel is configured with two alignments that are applied sequentially.
.. |Architecture| image:: images/translator/semantic-translator.png
:width: 70%
Place in architecture
=====================
The aerOS Semantic Translator can be utilized as a “standalone” tool – whenever RDF homogenization or translation is required.
*ST* is also an important “internal” building block for defining/creating heterogeneous *semantic pipelines*
within the aerOS *Data Fabric* product.
|Translator and semantic pipelines|
*Semantic Translator as sematic pipelines building block*
.. |Translator and semantic pipelines| image:: images/translator/semantic-translator-pipelines.png
:width: 70%
User guide
==========
The process of semantic translation is performed by *ST* based on the configuration of translation channel (using Apache Kafka or MQTT publish-subscribe mode) or sequence of alignments received via REST request (using REST API).
The `steps` element in an alignment defines in what order `cells` of the alignment are to be used. Each cell is applied to the RDF graph being an output of the application of the previous cell. Internally, *ST* converts cell applications into SPARQL UPDATE queries. These are generated from “graph patterns” expressed by `entity1` and `entity2`.
Lets know analyze step by step an example of transformation of one predefined RDF graph into another.
Let the input RDF be (using the `Turtle `_ serialization):
.. code-block:: TURTLE
[] a port:Element ;
port:haslatitude "26.94442"^^xsd:float ;
port:haslongitude "19.29351"^^xsd:float ;
port:hasmeteoStationId "2"^^xsd:int ;
port:hasname "P.Felipe" .
with the following binding of prefixes:
.. code-block:: TURTLE
@prefix geo: .
@prefix iiot: .
@prefix iiotex: .
@prefix ogis: .
@prefix port: .
@prefix rdf: .
@prefix sosa: .
@prefix var: .
@prefix xsd: .
Alignment cell given below, contains RDF graph patterns that generate correspondences (translation rules) that “match” any meteo station registration message, and produce its counterpart, expressed in terms of the GOIoTPex ontology.
.. code-block:: XML
var:elem a port:Element ;
port:haslatitude var:lat ;
port:haslongitude var:long ;
port:hasmeteoStationId var:id ;
port:hasname var:name .
var:station a vp:MeteoStation, iiot:IoTDevice, sosa:Sensor ;
iiotex:hasLocalId var:id ;
iiot:hasName var:name ;
iiot:hasLocation [
a iiot:Location ;
geo:asWKT var:geopos
] .
=
The sample message, when matched against the pattern from `entity1`, establishes appropriate “variable bindings,”
that are subsequently utilized/referenced in `entity2`, *transformation*, and *filters* sections of the cell.
The structure of the RDF graph of the message together with the “variable bindings” can be “graphically” represented as:
|input graph|
*RDF input graph with “bindings” produced by the alignment cell*
Because of the structure of the RDF graph pattern from `entity2`, and the created variable bindings, an instance of
`vp:MeteoStation`, `iiot:IoTDevice`, and `sosa:Sensor` needs to be generated from the value that was matched by the `var:id` variable.
This way, a numerical property of a blank node, representing a meteo station in the source data, is translated into an identifier (URI)
of an entity representing the station in the target ontology. Therefore, in transformations sections, functions are called to cast it
into string (STR), concatenate it with proper prefix (CONCAT), and generate the URI (IRI). The result is stored in the
`var:station` variable that is referenced in the graph pattern from `entity2`. The source `port:hasname` property is mapped
to the `iiot:hasName` property from the target (central) ontology.
|output graph|
*The RDF graph after the translation*
.. |input graph| image:: images/translator/rdf-graph-pre.png
:width: 70%
.. |output graph| image:: images/translator/rdf-graph-post.png
:width: 70%
..
Prerequisities
==============
Installation
============
**Building Docker image**
The only tool needed for compilation of the code is `SBT `__. All dependencies of the project will be automatically downloaded when `SBT` will be invoked for the first time.
To create a `Docker` image containing the latest version of the Semantic Translator, the user, from the `SBT` command prompt, needs to issue the command
.. code-block:: bash
docker
The command assumes that `Docker` is available on the develpment machine, and that the user has sufficient provileges to use it.
The resulting image will be available from the local `Docker` registry under the name `aeros-project/semantic_translator:n.n.n`, where `n.n.n` represents the current version of the tool.
The list of locally available images should be similar to:
.. code-block:: bash
user@machine:~$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
aeros-project/semantic_translator 1.0.0 2a2587804717 1 minute ago 295MB
Configuration options
=====================
Semantic Translator can be used as a standalone tool, offering translation via a REST interface. Streaming translation support, for both MQTT and Kafka message brokers, can be configured via suitable config parameters or environment variables.
Configuration parameters for the *ST* can be provided by setting the appropriate environment variables. For example, to change the port for the REST interface from the default `8080` to `8888`
.. code-block:: bash
export SEMTRANS_REST_PORT="8888"
By default *ST* is configured to handle MQTT-MQTT channels only, which in terms of environment variable can be expressed as:
.. code-block:: bash
export SEMTRANS_BROKER_TYPES.0="MM"
To add Kafka-Kafka type channels handling the above configuration should be augmented as follows:
.. code-block:: bash
export SEMTRANS_BROKER_TYPES.0="MM"
export SEMTRANS_BROKER_TYPES.1="KK"
In order to configure a locally running MQTT broker as the source and sink for the messages the following set of environment variables can be used:
.. code-block:: bash
export SEMTRANS_MQTT_SRC_HOST="host.docker.internal"
export SEMTRANS_MQTT_SRC_PORT="1883"
export SEMTRANS_MQTT_TRG_HOST="host.docker.internal"
export SEMTRANS_MQTT_TRG_PORT="1883"
To configure Kafka (streaming) message broker for the *ST*
.. code-block:: bash
export SEMTRANS_KAFKA_HOST="host.decker.internal"
export SEMTRANS_KAFKA_PORT="29092"
The configuration parameters can also be set directly by editing the `application.conf` configuration file.
.. code-block:: text
semtrans = {
supported-channel-types = ["MM"] // MQTT-MQTT only – the default
// supported-channel-types = ["KK"] // Kafka-Kafka only
// supported-channel-types = ["MM", "KK"] // Both MQTT-MQTT and Kafka-Kafka
supported-channel-types = ${?SEMTRANS_BROKER_TYPES}
db-sqlite {
driverClassName = org.sqlite.JDBC
jdbcUrl = "jdbc:sqlite:/data/ipsm.sqlite"
}
http {
port = "8080"
port = ${?SEMTRANS_REST_PORT}
host = "0.0.0.0"
host = ${?SEMTRANS_REST_HOST}
}
mqtt {
messageSizeLimitInKB = 256
messageSizeLimitInKB = ${?SEMTRANS_MQTT_MSG_SIZE}
source {
host = "host.docker.internal"
host = ${?SEMTRANS_MQTT_SRC_HOST}
port = 1883
port = ${?SEMTRANS_MQTT_SRC_PORT}
}
target {
host = "host.docker.internal"
host = ${?SEMTRANS_MQTT_TRG_HOST}
port = 1883
port = ${?SEMTRANS_MQTT_TRG_PORT}
}
}
kafka = {
host = "host.docker.internal"
host = ${?SEMTRANS_KAFKA_HOST}
port = 29092
port = ${?SEMTRANS_KAFKA_PORT}
}
}
..
Developer guide
===============
Authors
=======
`Systems Research Institute, Polish Academy of Sciences, Warsaw `__
License
=======
`Apache 2.0 `__.
..
Notice (dependencies)
=====================