<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The MICO Broker: An Orchestration Framework for Linked Data Extractors</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Patrick Aichroth</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcel Sieland</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Cuccovillo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Kollmer</string-name>
          <email>thomas.koellmer@idmt.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer Institute for Digital Media Technology IDMT</institution>
          ,
          <addr-line>Ehrenbergstra e 31, 98693 Ilmenau</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the MICO broker, a management and orchestration framework for Linked Data extractors. It outlines the initial version of the broker, illustrates the key challenges and requirements for extractor orchestration in the MICO project, and provides an improved MICO broker design and implementation that addresses these key challenges. The paper describes the interaction with the Linked Data approach applied in MICO for this purpose, especially regarding the broker data model, semi-automatic work ow creation and work ow execution.</p>
      </abstract>
      <kwd-group>
        <kwd>extractor orchestration</kwd>
        <kwd>cross-media analysis</kwd>
        <kwd>metadata extraction</kwd>
        <kwd>linked data</kwd>
        <kwd>work ow creation</kwd>
        <kwd>work ow execution</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>1. shot detection, and shot boundary and key frame extraction
2. face detection, which operates on extracted boundary or key frames
3. audio demux, speech2text and named entity recognition</p>
      <p>The resulting annotations can be used for queries such as \Give me all shots in a video where
a person says something about topic X". It is important to note that the described work ow
could be further extended and improved, e.g., using speaker identi cation, face recognition, or
extraction of metadata from the respective MP4 container or API where the content may have
been crawled, all of which demonstrates: There is a huge potential in combining extractors which
are typically used in isolation.</p>
      <p>The following will describe the challenges, design and implementation of the MICO broker to
support such work ows, thereby exploiting the Linked Data based MICO infrastructure: Section 2</p>
      <sec id="sec-1-1">
        <title>1 MICO project website: http://www.mico-project.eu/</title>
        <p>will describe the initial v1 MICO broker functionalities and limitations, and section 3 will outline
the relevant requirements to be addressed. Section 4 will then provide an overview of the broker
approach and its components. The related broker model that extends the MICO metadata model
is outlined in section 5, and the approaches to work ow creation and execution are described in
sections 6 and 7. Section 8 will provide conclusions and an outlook.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>MICO broker v1: Initial work and limitations</title>
      <p>
        The initial v1 of the MICO broker was implemented on top of RabbitMQ [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], following the
principles of AMQP2. Extractor orchestration was implemented using two di erent message
queues: A content item input queue for receiving new content, and a temporary content item
replyto queue created per content item, with each extractor providing results. For service registration,
broker v1 provided a registry queue for analysis service registration, and a temporary
replyto queue that was created by each service discovery event, to take care of service registration
and analysis scheduling. The implementation completely decoupled extractors from extractor
orchestration, in order to support free choice of extractor programming language and potential
distribution of extraction tasks. For convenience purposes, an Event API exposes a basic set of
instructions for extractors to interact with the broker, available in both Java and C++ (other
languages supporting AMQP could be used as well).
      </p>
      <p>The described v1 proved to be quite stable and robust, especially regarding the RabbitMQ
messaging infrastructure. However, extractor orchestration was based on simple,
mime-typebased comparison: Upon registration of a new extractor process, all connections possible for
that mime type were established, including unintended ones and loops. Hence, it was clear that
further improvements were necessary.</p>
      <sec id="sec-2-1">
        <title>2 Advanced Message Queuing Protocol 1.0, https://www.amqp.org/</title>
        <p>To address some of the most pressing questions, while still keeping compliance with the Event
API, some instant improvements were provided shortly after broker v1. The so-called pipeline
con guration tools, consisting of a mixture of bash scripts and servlet-based Web UI con
gurations support
{ Standard extractor parameter speci cation
{ User-controlled pipeline con guration
{ User-controlled service startup/shutdown.</p>
        <p>Beyond this, however, the project soon required a more advanced approach for work ow
orchestration and execution, including a distinction between syntactical versus semantic input and
output of extractors (substituting the simple mime type approach), support for multiple
inputs and outputs, extractor parametrization, a more usable approach to de ning work ows, and
support for dynamic routing in work ows and general support for EIP (Enterprise Integration
Patterns) for work ow execution.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>MICO Broker requirements</title>
      <p>Based on broker v1 experiences, an extensive list of requirements for orchestration was compiled,
and then prioritized, resulting in the following key points:
1. General requirements including backward-compatibility regarding the existing infrastructure
(in order to reduce e orts for extractor adaptation, especially regarding RabbitMQ, but also
the Event API wherever possible); reuse of established existing standards / solutions where
applicable; and support for several work ows per MICO system, which was not possible with
v1.
2. Requirements regarding extractor properties and dependencies : support for extractor con
guration, and support for di erent extractor \modes", i.e., di erent functionalities with di erent
input, output or parameter sets, encapsulated within the same extractors; support for more
than one extractor input and output; support for di erent extractor versions; and distinction
between di erent I/O types: mime type, syntactic type (e.g., image region), semantic concept
(e.g., human face).
3. Requirements regarding work ow creation: avoiding loops and unintended processing;
extractor dependency checking during planning and before execution; simplifying the work ow
creation process (which was very complicated).
4. Requirements regarding work ow execution: error handling, work ow status and progress
tracking and logging; support for automatic process management; support for routing,
aggregation, splitting within extraction work ows (EIP support); and support for dynamic
routing, e.g, for context-aware processing, using results from language detection to
determine di erent subroutes (with di erent extractors and extractor con gurations) for textual
analysis or speech2-to-text optimized for the detected language.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Overview: MICO broker v2 and v3</title>
      <p>MICO broker v2 and v3 address the requirements outlined in section 3. V2 focuses on changes and
extensions to the Event API, especially related to error management, progress communication,
provision of multiple extractor outputs, and registration. The goal of v2 was to provide an earlier
API update, which gave extractor developers an opportunity to adapt to it, also considering
overall data model changes which are also in uenced by the new broker model described in
section 5. V3, in contrast, was meant to focus on the addition of new broker components for
registration, work ow creation and execution.</p>
      <p>The following describes the principles and assumptions for design and implementation, and
the components used to provide the respective functionalities.
4.1</p>
      <p>Principles and Assumptions
Regarding extractor registration and model, assumptions and principles include:
{ Some parts of extractor information can be provided during packaging by the developer
(extractor I/O and properties), while other parts can only be provided after packaging, by
other developers or showcase administrators (semantic mapping, and feedback about pipeline
performance): Registration information is provided at di erent times.
{ Extractor input and output should be separated into several meta types (a) mime-types e.g.,
`image/png', (b) syntactic types e.g., `image region', and (c) semantic tags e.g., \face region".
Mime-types and syntactical types are pre-existing information that a extractor developer /
packager can refer to using the MICO data model or external sources, while semantic tags are
subjective, depending on the usage scenario, will be revised frequently, and are often provided
by other developers or showcase administrators. Often, they cannot be provided at extractor
packaging time, nor do they need to be, as they do not require component adaptation. As a
consequence, di erent ways of communicating the various input and output types are needed.
{ A dedicated service for extractor registration and discovery can address many of the
mentioned requirements, providing functionalities to store and retrieve extractor information,
supporting both a REST API for providing extractor registration information, and a
frontend for respective user interaction, which is more suitable to complement information that
is not or cannot be known to an extractor developer at packaging time. It will use Marmotta
for the extractor model storage using Linked Data. Work ow planning and execution can
reuse this information for their purposes.
{ Existing Linked Data sources and the MICO metadata model (MMM) should be reused as
far as possible, e.g., for syntactic types, but that related information should be cached by
broker components for performance reasons; wherever applicable, extractors and extractor
versions, types etc. should be uniquely identi ed via URIs
Regarding work ow planning and execution, we came to the following conclusions:
{ Apache Camel is a good choice for work ow execution, supporting many EIP and all core
requirements for the project. It should be complemented by MICO-speci c components for
retrieving data from Marmotta to put them into Camel messages, in order to support dynamic
routing.
{ The broker should not deal with managing scalability directly, but allow later scalability
improvements by keeping information about extractor resource requirements, and allowing
remote extractor process startup and shutdown.
{ Manual pipeline creation is a di cult process, due to the many constraints and
interdependencies, depending on the aforementioned types, but also content, goal, and context of an
application. Considering this, we found that it would be extremely desirable to simplify the
task of pipeline creation using a semi-automatic work ow creation approach that considers
the various constraints. Additionally it is supposed to store and use feedback from showcase
admins on which extractors and pipelines worked well for which content set and use cases.
{ We need to support content sets and the mapping of such sets to their speci c work ows.
4.2</p>
      <p>Broker Components
The resulting broker includes and interacts with the components depicted in Figure 3. As
mentioned above, the registration service provides a REST API to register extractors (which
produce or convert annotations and store them as Linked Data / RDF), thereby collecting all
relevant information about extractors. It also provides global system con guration parameters
(e.g., the storage URI) to the extractors, and retrieval and discovery functionalities for extractor
information that are used by the work ow planner. The work ow planner is responsible for the
semi-automatic creation and storage of work ows, i.e., the composition of a complex processing
chain of registered extractors that aims at a speci c user need or use case. Once work ows have
been de ned, they can be assigned to content sets. Finally, the item injector is responsible of
injecting content sets and items into the system, thereby storing the input data, and triggering
the execution of the respective work ows (alternatively, the execution can also be triggered
directly by a user). Work ow execution is then handled by the work ow executor, which uses
Camel, and a MICO-speci c auxiliary component to retrieve and provide data from the data
store to be used for dynamic routing within work ows. Finally, all aforementioned Linked Data
and binary data is stored using the data store.
4.3</p>
      <p>Extractor Lifecyle
From an extractor perspective, the high-level process can be summarized as depicted in
Figure 4: Extractor preparation includes the preparation and packaging of the extractor
implementation, including registration information that is used to automatically register the extractor
component upon extractor deployment, and possible test data and information for the
extractor. As soon as extractor registration information is available, it can be used for work ow
creation, which may include extractor / work ow testing if the required test information
was provided earlier. For planning, or the latest for execution, the broker will then perform an
extractor process startup, and work ow execution will then be performed upon content
injections or user request, as outlined above.</p>
      <p>The following sections will provide more details on the broker model, work ow planning and
execution, thereby clarifying how Linked Data is exploited within these domains.
5</p>
    </sec>
    <sec id="sec-5">
      <title>MICO broker model and Linked Data</title>
      <p>The data model of the MICO broker was designed to capture the key information needed to to
support extractor registration, work ow creation and execution, and collecting feedback from
annotation jobs (i.e., processing work ows applied to a de ned content set), thereby addressing
the general principles outlined in section 4, and considering the key requirements from section 3.
It uses URIs as elementary data and extends the MICO MetadataModel (MMM)3 presented in
[1, ch. 3]. The broker data model is composed of four interconnected domains, represented by
di erent colors in gure 5, which are described in the following:
{ The content description domain (yellow) with three main entities: ContentItem captures
information of items that have been stored within the system. As described in [1, ch. 3.2],
MICO items combine media resources and their respective analysis results in ContentPart.
ContentItemSet are used to group several ContentItems into one ContentItemSet. Such a Set
can be used, to run di erent pipelines on the same set, or to repeat the analysis with an
updated extractor pipeline con guration.
{ The extractor con guration domain (blue) with two main entities: ExtractorComponent,
which captures general information about registered extractors, e.g., name and version and
ExtractorMode, which contains information about a concrete functionality (there must be at
least one functionality per extractor), which includes information provided by the developer</p>
      <sec id="sec-5-1">
        <title>3 http://mico-project.bitbucket.org/vocabs/mmm/2.0/documentation/</title>
        <p>at registration time, e.g., a human-readable description and con guration schema URI. For
extractors which create annotations in a format di erent from RDF, it includes a output
schema URI.
{ The input/output data description domain (green) stores the core information necessary
to validate, create and execute extractor pipelines and work ows: IOData represents the core
entity for the respective input or output to a given ExtractorMode.</p>
        <p>MimeType is the rst of three pillars for work ow planning, as outlined in Section 4.1. RDF
data produced by extractors will be labeled as type rdf/mico. textitIOData has MimeType
connects I/O data to MimeType. It has an optional attribute FormatConversionSchemaURI
which signals that an extractor is a helper with the purpose of converting binary data from
one format to another one (e.g. PNG images to JPEG ).</p>
        <p>
          The SyntacticType of data required or provided by extractors is the second pillar for work ow
planning. For MICO extractors which produce RDF annotations, the stored URI should
correspond to an RDF type, preferably to one of the types de ned by the MICO Metadata
model([1, ch. 3.4]). For binary data, this URI corresponds to a Dublin Core format [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
SemanticType is the third pillar for route creation and captures high-level information about
the semantic meaning associated with the I/O data. It can be used by showcase
administrators to quickly discover new or existing extractors that may be useful to them, even if the
syntactical type is not (yet) compatible { this information can then be exploited to request
an adaptation or conversion.
{ The platform management domain (orange) combines several instances related to platform
management: ExtractorInstance is the elementary entity storing the URI of a speci c instance
of an extractor mode, i.e., a con gured extraction functionality available to the platform. The
information stored in the URI includes the parameter and I/O data selection and further
information stored during the registration by the extractor itself.
        </p>
        <p>EvalInfo holds information about the analysis performance of an ExtractorInstance on a
speci c ContentItemSet. This can be added by a showcase administrator to signal data sets
for which an extractor is working better or worse than expected.</p>
        <p>Pipeline captures the URI of the corresponding work ow con guration, i.e., the composition
of ExtractorInstances and respective parameter con guration.</p>
        <p>UseCase is a high-level description of the goal that a user, e.g., showcase administrator,
wants to achieve.</p>
        <p>Job is a unique and easy-to-use entity that links a speci c Pipeline to a speci c Content Item
Set. This can e.g. be used to verify the analysis status.</p>
        <p>UseCase has Job is a relation that connects a UseCase to a speci c Job, which can be used
to provide feedback, e.g., to rate how well a speci c Pipeline has performed on a speci c
ContentItemSet
As outlined in 4.1, a key broker assumption is that some extractor information is provided at
packaging time by the developer (extractor properties, input and output) while other extractor
information will typically be provided after packaging time, by other developers or showcase
administrators. The registration service is one central point to register and query that extractor
information, which provides the information needed for work ow planning (see section 6) and
execution (see section 7). The registration service provides both a REST API for providing
extractor registration information, and a front-end for respective user interaction, to complement
information that is not or cannot be provided by an extractor developer at packaging time,
including feedback on how well certain pipelines or extractors performed for content sets.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Semi-automatic Work ow creation</title>
      <p>During the MICO project, the manual creation of work ows turned out to be more complicated
and di cult than expected, as it depends not only on extractor interdependencies and constraints
on multiple levels, but also on the content at hand, and the context and goal of an application. In
order to address this problem, the idea of a semi-automatic work ow creation process emerged.
It was implemented using the idea of nding possible combinations of matching extractors using
the Linked Data information pillars outlined in sections 4 and 5. MimeType and syntacticType
signal syntactical interoperability, and semanticTags signal a semantic match. Beyond that, if
available, feedback on how well work ows performed on content sets can be used as well.</p>
      <p>It is important to note that these pillars do not represent a simple hierarchy: For instance, the
indication of two extractors providing and consuming a matching mimeType and syntacticType,
but lacking the same semanticType can be used to signal to the service which extractors could
match and hence should be linked via a new semanticType, requiring human feedback to create
this link. Vice versa, if it turns out that two extractors seem to provide similar output, as signalled
by syntacticType and semanticType, but the mimeType does not t, this can be exploited as a
signal that a simple extension of the extractor to support a new mimeType, e.g., via format
conversion, could do the trick to create interoperability.</p>
      <p>Figures 6 and 7 provide screenshots of the current work ow creation tool for MICO. In this
example, the creation process started from multimedia content with mp4 video and text, where
the user could incrementally add suitable extractors proposed by the GUI using the
aforementioned pillars. However, Figure 6 shows a work ows that is validated, while Figure 7 shows the
same but invalid work ow, which is due to a slightly di erent audio demux extractor con
guration: The latter does not provide the output of type audio/wav, which leads to an incompatibility
that is signaled within the GUI. After completion, the user can store the resulting work ow as
a Camel route, which can then be used to execute the work ow.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Work ow execution and dynamic routing</title>
      <p>Once work ows have been created and stored as Camel routes, using the semi-automatic approach
(see section 6), they can be assigned to content items and sets, and execution can be triggered
via the item injector or the user / showcase admin (see section 4).</p>
      <p>The actual work ow execution is performed using four main components, two of which were
already mentioned in section 4): The work ow executor as master component, which uses the
other components and is based on Apache Camel, and the auxiliary component, a MICO-speci c
extension to Camel that allows Linked Data retrieval to support dynamic routing. In addition,
the RabbitMQ message broker serves as communication layer to loosely couple extractors and
the MICO platform, and a MICO-speci c Camel endpoint component that connects Camel with
the MICO platform, and triggers extractors via RabbitMQ.
Dynamic routing based on Linked Data works as depicted in the short example work ow for
extracting spoken words from an mp4 video ( gure 8). After audio demuxing (demux ), the
audio stream from the mp4 video is stored and provided to diarization4 and language detection
(lang detect. Both analyze the audio content in parallel, and store their annotations in Marmotta.</p>
      <p>
        At this point, dynamic routing is applied to optimize performance: The auxiliary component
loads the detected language from Marmotta and puts it into the Camel message header { it
knows where to locate the detected language, as lang detect described by the storage location
with its registration data via LDPath [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Afterwards, the language information within the Camel
message header is evaluated by a router component, which triggers the Kaldi extractor optimized
for the detected language. Beyond this example, there are many use cases where such dynamic
routing capabilities can be applied.
8
      </p>
    </sec>
    <sec id="sec-8">
      <title>Conclusion and outlook</title>
      <p>This paper has described the challenges and requirements of cross-media extraction orchestration
based on Linked Data within the MICO project, and how they were addressed with a mix of
existing frameworks and MICO-speci c extensions.</p>
      <sec id="sec-8-1">
        <title>4 The segmentation of audio content based on speakers and sentences.</title>
        <p>While all major requirements could be met, there is still a lot of potential for future
improvements: For semi-automatic work ow planning and creation, usability could be further enhanced,
e.g., by allowing de nition and combination of sub-graphs. Moreover, project experiences within
the nal project phase are likely to result in further demands with respect to process monitoring
and management, and regarding scalability improvements.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgements</title>
      <p>This work has been partially funded by the European Commission 7th Framework Program,
under grant agreement no. 610480.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aichroth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bjoerklund</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schlegel</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurz</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , Kollmer, T.:
          <article-title>Dx.2.2 Speci cations and Models for Cross-Media Extraction</article-title>
          , Metadata Publishing, Querying and Recommendations: Final Version. Deliverable,
          <string-name>
            <surname>MICO</surname>
          </string-name>
          (
          <year>October 2015</year>
          ), http://www.mico-project.eu/wp-content/uploads/2016/01/ Dx.2.2
          <article-title>-SPEC_final_READY_FOR_SUBMISSION</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Board</surname>
          </string-name>
          , D.U.:
          <article-title>DCMI Metadata Terms</article-title>
          .
          <source>Tech. rep.</source>
          , Dublin Core Metadata Initiative (jun
          <year>2012</year>
          ), http: //dublincore.org/documents/dcmi-terms/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Pivotal</given-names>
            <surname>Software</surname>
          </string-name>
          , Inc.:
          <article-title>Rabbitmq - messaging that just works (oct 2004-</article-title>
          <year>2015</year>
          ), https://www. rabbitmq.com/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Scha ert, S.,
          <string-name>
            <surname>Bauer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurz</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dorschel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glachs</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The Linked Media Framework: Integrating and Interlinking Enterprise Media Content and Data</article-title>
          .
          <source>Proceedings of the 8th International Conference on Semantic Systems - I-SEMANTICS '12</source>
          (
          <year>2012</year>
          ), http://dl.acm.org/ citation.cfm?id=
          <fpage>2362504</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>