“A picture is worth a thousand words”? - From
                    Project Inception to First Results: Describing
                    Cross-disciplinary Collaboration in the Digital
                              Humanities Project ChIA

                   Yalemisew Abgaz1[0000−0002−3887−5342] , Amelie Dorn2 , Gerda Koch3 , and Jose
                                                Luis Preza Diaz2
                           1
                               Adapt Centre, School of Computing, Dublin City university, Ireland
                                             Yalemisew.abgaz@adaptcentre.ie
                                          2
                                            Austrian Academy of Sciences, Austria
                                     {amelie.dorn,JoseLuis.PrezaDiaz }@oeaw.ac.at
                                             3
                                               Europeana-local Austria, Austria
                                                kochg@europeana-local.at


                           Abstract. Both, historical as well as contemporary images depicting
                           particular aspects of a certain culture, or a certain cultural practice, are
                           widely available in a variety of formats. Typically, historical-cultural im-
                           ages can be found in museums, national archives and libraries both in
                           analogue and digital formats. The Europeana image collection serves as
                           a valuable example of a digital image collection available to the public,
                           that is freely accessible and searchable. A significant amount of the collec-
                           tion, however, does not contain a rich semantic description of the cultural
                           and social aspects represented in the images that would go beyond some
                           metadata like author and title. To enable the enrichment of these images,
                           we propose a project which brings together experts from digital human-
                           ities, linguistics, artificial intelligence and semantic web technology. The
                           project aims at analysing the contents of the images with a combination
                           of computer vision, natural language processing and manual curation
                           to represent them with a more descriptive and representative controlled
                           vocabulary. This combination of different types of expertise to address
                           the problem enables us to closely collaborate and learn from each other
                           by taking different roles and perspectives. So far, the collaboration has
                           contributed to the understanding of the detailed requirements from the
                           digital humanities and socio-linguistic perspectives for the representation
                           and processing of cultural images using semantic web technologies, such
                           as multi-disciplinary thesauri, ontologies, computer vision and AI.

                           Keywords: Cultural image analysis · Semantic enrichment · Computer
                           vision · Ontology · Knowledge design


                     Copyright 2020 for this paper by its authors. Use permitted under.
                     Creative Commons License Attribution 4.0 International (CC BY 4.0).


Twin Talks 2 and 3, 2020           Understanding and Facilitating Collaboration in Digital Humanities      56/143
                  2          Abgaz. Y et al.

                  1        Introduction

                  Fred R. Barnard famously once claimed that “A picture is worth a thousand
                  words”. This assumption resonates well with the aims of the collaborative project
                  discussed in this paper. By cultural image, we understand any image (artwork,
                  photograph, sketch, etc) that depicts cultural artefacts, practices, or social sit-
                  uations, where we particularly focus on food-related content in the context of
                  the project described in more detail below. The historical dimension is just one
                  of several different ones under the umbrella of culture and particularly alludes
                  to places, persons or events. Several books have indeed been written about his-
                  torical images (pictures, paintings, photographs, sketches etc) such as the Mona
                  Lisa by Leonardo da Vinci. Except for a few Nobel artworks, no matter how
                  much is written about them or captured in the metadata, there remains am-
                  ple information contained in the images themselves which is left uncaptured in
                  words. In many museums, cultural heritage and public archives, there are im-
                  ages which vividly and meticulously represent the cultural, social, political and
                  other aspects of a society in time. The modern-day digital technology, however,
                  heavily relies on the exchange of information in a form of text to present or rep-
                  resent things. Until recently, often only a few keywords or descriptive sentences
                  were used to annotate, search and retrieve images. This has created a huge gap
                  between the symbolic “thousand words” and the few keywords and descriptors
                  associated with the images.
                      In this paper, we present a collaborative research endeavour which aims to un-
                  pack the number of details contained in cultural images and represent them with
                  carefully selected, semantically interlinked multi-faceted ontologies, taxonomies
                  and thesauri. To this end, a group of researchers from the digital humanities
                  (socio-cultural linguists, archival curators) and computer scientists (computer
                  vision and AI experts and ontology engineers) bring the ChIA5 project (access-
                  ing & analysing cultural images with new technologies) together, to carry out
                  pilot research in the area [3]. Although the overall objectives of the ChIA project
                  encompass a much wider range of goals, this paper mainly tries to answer the
                  following main questions:

                      – How can the interaction between Digital Humanities researchers and com-
                        puter scientists result in a formal collaboration?
                      – What are the methods of collaboration and technical requirements?
                      – What are the lessons learned so far, and what are the challenges faced in the
                        collaborative project setting?

                  We further report the collaboration results so far, and how these results are
                  understood and interpreted by the different experts in the group. The technical
                  solutions that are proposed in the research and the methods we followed to
                  bridge the gap are also discussed in detail.
                   5
                       https://chia.acdh.oeaw.ac.at/


Twin Talks 2 and 3, 2020           Understanding and Facilitating Collaboration in Digital Humanities   57/143
                  “A picture is worth a thousand words”? - From Project Inception to Results.        3

                  2        The Humanities Research Problem

                  The cradle of this collaborative project is an inspiring, informal brainstorming
                  session held between a digital humanist, who is an expert in the area of lexico-
                  graphic collections, a computer science expert and an ontology and knowledge
                  engineer. The discussion identified several pertinent, to-date unresolved chal-
                  lenges related to historical collections including cultural images, pictures, pho-
                  tographs, sketches etc., collected from several centuries and stored in libraries,
                  archives and museums. The main challenge identified during the discussion was
                  that the majority of these resources lack any rich semantics (description) except
                  for some available metadata, for example, date, author and title. Despite this
                  fact, these collections, particularly historical images, bear rather rich aspects
                  describing the detailed social, cultural, political, economic, etc. interaction of
                  different societies. These aspects are not represented in a written form except
                  for images well studied by experts. This led to the under-exploitation of the re-
                  sources as a whole for research, educational and business purposes. Considering
                  the richness of digital collections and the impact they have for understanding
                  the culture and interaction between societies at different times, it became clear
                  that digital humanities was concerned with the long-standing question of how
                  to make these resources better accessible and available by drawing on the ex-
                  pertise from different disciplines. The challenging factor until now, however, is
                  concerned with the questions of how to organise cultural image resources auto-
                  matically, how to analyze their content and the aspects and represent them in
                  a more detailed, interlinked and interrelated manner. It also focuses on how to
                  support systematic search and retrieval of these resources using rich semantics
                  which can go beyond searches based on facets such as author and title.
                       Since the inception of this research question, we conducted studies [4] to con-
                  solidate our research question and to understand the magnitude of the problem
                  [10]. During the proposal preparation phases, we began to understand that is
                  was necessary to include additional experts from digital humanities who were di-
                  rectly involved in providing platforms and access to historical and cultural image
                  collections for their users. After inviting such experts from Europeana local Aus-
                  tria, we advanced our understanding of the day-to-day challenges of the experts
                  [8]. The challenges are non-trivial ones which still require a systematic approach
                  and a deeper collaboration between linguists, computer scientists, knowledge or-
                  ganisation/ontology experts and digital humanity researchers. Thus, the main
                  research questions from the humanities point of view are:

                    – How can the rich information contained in historical images be explicitly
                      represented, semantically enriched and interlinked by these resources?
                    – Which intelligent and interactive tools can be provided to make such re-
                      sources searchable, analyzable and exploitable for both humans and machine
                      agents?

                  Within this set of specific research questions, we further included the following
                  technical questions:


Twin Talks 2 and 3, 2020        Understanding and Facilitating Collaboration in Digital Humanities       58/143
                  4          Abgaz. Y et al.

                      – What is the best method to analyze and represent the contents of historical
                        images with or without written detailed description about the images?
                      – What would be the best standards to follow to semantically enrich these
                        images once their content was analysed.
                  This opens up the path to the investigation of best knowledge organisation meth-
                  ods and tools that support semantic annotation/enrichment, and semantic search
                  of historical images.
                      With the three major research questions emanating from the first question,
                  we developed a collaboration between four major disciplines: digital humanities,
                  socio-cultural linguistics, computer science (computer vision and AI in partic-
                  ular) and knowledge organisation and semantic web technologies. This unique
                  collaboration between the experts (Section 3) enabled us to deeply understand
                  the problem at each corner of the interaction and helped us to tackle the tech-
                  nical details outlined in Section 4.


                  3        The Collaboration Experience
                  ChIA is an ongoing collaborative project and it is very important to give a de-
                  tailed description of the composition of the team and how the team collaborates.

                  3.1      The ChIA-team Details
                  There are four core members in the ChIA collaboration, broadly categorised
                  as digital humanists (two) and computer scientists (two). The digital human-
                  ist group mainly focuses on the analysis and representation of cultural images
                  to enhance better access for users of the systems. Their main research inter-
                  ests include the information is contained in cultural images, the aspects of a
                  society the images represent, how such aspects are represented and how the in-
                  formation contained in the images is represented using different languages and
                  formats. The two digital humanists have further specialisation in their interest.
                  The first member is a linguist by profession who is interested in conducting a
                  socio-cultural and socio-linguistic analysis of cultural images and design thinking
                  methods on how societies represent complex historical and cultural events using
                  digital images. The second member is a manager and content-coordinator and
                  also focuses on providing metadata and platforms for collecting, organising and
                  presenting cultural images on the Europeana-local-Austria platform. The first
                  focuses on the research aspect, the second focuses on the implementation and
                  provision of the actual service.
                      The second group, computer scientists, also has two members specializing
                  in different domains in computer science. One of them is an AI and computer
                  vision expert with the objective of looking at cultural images as opposed to con-
                  temporary image collections to find out how details contained in an image can
                  be extracted by computer vision tools. Finally, we have the fourth member who
                  specialises in knowledge representation, focusing on how complex social and cul-
                  tural aspects can be represented across disciplines using ontologies, taxonomies


Twin Talks 2 and 3, 2020           Understanding and Facilitating Collaboration in Digital Humanities   59/143
                  “A picture is worth a thousand words”? - From Project Inception to Results.        5

                  and thesauri. The main objective of this team from the technical point of view is
                  to analyse cultural images and extract as much detail as possible and represent
                  such detail with semantically rich, interlinked and interoperable system and to
                  support its exploitation via chatbots.
                      There are two principal investigators in this project, one from the digital
                  humanities and the other from the technical computer science. The most impor-
                  tant element next to the research question is enabling a thorough understanding
                  among the team members. According to research on a diversified multidisci-
                  plinary team [7], our team is diversified in several ways including task-related
                  diversity (e.g. education, profession etc) and relations-oriented diversity (e.g.
                  sex, geography). Having a diversified background often is a problem unless it
                  is managed and directed systematically. For this purpose, we have built strong
                  interpersonal and project management skills to lead this project as PIs. Since
                  the very interaction between these two domains to address the outlined prob-
                  lem is important, we have also set up different communication channels to keep
                  each other up-to-date on our day-to-day interactions. Besides, the project also
                  draws on advice and guidance from an international advisory board, of again four
                  members, who are also experts in the fields of semantic technologies, knowledge
                  design, the GLAM sector and AI.

                  3.2      The CHIA Cross-cultural Team Communication
                  The team is not only professionally diversified but also physically dispersed with
                  only two of the team members working in the same building while the third
                  team member works in the same country but in a different city. The fourth team
                  member is in Dublin, Ireland. This physical distance introduced challenges and
                  opportunities. The main challenge is that the team members are not able to meet
                  and update their day-to-day activities. However, different channels are used to
                  plan tasks and update progress. The most important one is the weekly project
                  meeting which is held via Skype. This meeting allows us to update our progress,
                  to plan our next tasks and serves as a podium to ask a question about topics we
                  do not understand. This gives us a great opportunity to have live interactions
                  among the team members. We have understood that there are at least three
                  types of members in the team: those who provide up-to-date information daily,
                  sending updates and any relevant information about the project, those who keep
                  the work going by experimenting and building prototypes to test the ideas that
                  are circulated among the team and those who prepare, plan and evaluate the
                  overall direction and progress of the project. They keep the meetings running
                  smoothly, supervise the adherence of the work to relevant standards etc. The
                  four members usually play one or more of these roles interchangeably during
                  our meetings or throughout the week. Our official communication channel is via
                  email and our document sharing platform is google drive. We prefer to use google
                  as it supports sharing and collaborative editing of project-related documents and
                  supports easy and quick interaction among the project members wherever they
                  are. Also, a project space on slack has been set up to communicate important
                  information to team members and the advisory board.


Twin Talks 2 and 3, 2020        Understanding and Facilitating Collaboration in Digital Humanities       60/143
                  6          Abgaz. Y et al.

                      One of the major drawbacks of the online communication system is that it
                  is very easy to lose track of the agenda while members try to explain complex
                  concepts. To reduce this, we further introduced internal review meetings every six
                  months to come together physically to review our progress and discuss complex
                  matters in person. The other limitation is the lack of sufficient time to discuss
                  and understand complex issues within a single meeting. This is a difficult scenario
                  and when such complex issue arises, it is discussed offline in one-to-one settings.
                  This makes it easier to understand the issues in detail and further allow for an
                  hunhastened and friendly discussion. In general, the physical distance between
                  the members has contributed to slightly reduce the progress of our understanding
                  of other members perspectives and interests.


                  4        Description of the Technical Collaboration: The Case
                           of Building Experimental Dataset
                  One of the main deriving research questions is bridging the gap between the
                  information packed in the images and the explicit annotation of the content of
                  the images using ontologies. The interaction between the team members to un-
                  derstand the problem and to work toward the solution required the following
                  interaction (Fig. 1) to be held all the time among the team members. These
                  interactions represent various levels of research collaboration within the project.
                  For the sake of brevity, the interaction is explained by taking the dataset prepa-
                  ration phase of the project.

                  4.1      The DH Quadrant
                  The Digital Humanities (DH) quadrant is dedicated to identifying cultural im-
                  ages that are widely used by users of the platform. The digital humanist uses the
                  collection to search and retrieve cultural images. The total image base consists
                  of data originating from many different repositories created by a considerable
                  number of cultural heritage professionals during several decades [11]. As a re-
                  sult, the data is very diverse and the collection is huge. For the experiment,
                  we, therefore, limit ourselves to cultural images related to staple food edible
                  for humans. To carry out the selection of the images, the DH directly interacts
                  with the Socio-Cultural Linguistic (SL) to determine how food-related culture is
                  represented in the language of the search (which is German and English at this
                  stage). This collaboration yields a list of terminologies that are widely used to
                  represent the cultural aspects contained in the images. For working with the im-
                  ages in all the other quadrants, we needed access to the Europeana-Local-Austria
                  system. The DH set up a web-based interface that makes use of a customised
                  Europeana Search API to enable search, selection and management of selected
                  subsets within the image collection. The interface allows a (Boolean) search for
                  images on metadata element level and the retrieval of a considerable image base
                  for CV analysis. The DH gave us a short introductory training on how to use the
                  system. This very useful collaboration improves the understanding of the rest of


Twin Talks 2 and 3, 2020           Understanding and Facilitating Collaboration in Digital Humanities   61/143
                  “A picture is worth a thousand words”? - From Project Inception to Results.        7


                                Fig. 1: Interaction diagram among project members.


                  the team and interaction with the system and further helps us to reflect on the
                  problem from the DH side.


                  4.2      The SL Quadrant

                  The SL quadrant, in turn, interacts with the DH to understand what kind of
                  cultural images are available in the collection and how they are represented in
                  the metadata so far. This includes understanding the conceptualisation of the
                  cultural aspects of societies [6]. The SL directly collaborates with the Semantic
                  Web (SW) engineer to identify existing ontologies, thesaurus and dictionaries
                  that represent the socio-cultural aspects of the images. The collaboration further
                  looks into the novel combination of existing tools or creation of new ontologies
                  to represent concepts that were not represented. This collaboration serves as an
                  input to the SW in utilising existing and new ontologies to semantically annotate
                  and represent cultural images in a more robust manner.


                  4.3      The SW Quadrant

                  The main question in the Semantic Web (SW) quadrant is to identify useful
                  ontologies to represent the rich information contained in the images to support
                  semantic annotation, reasoning and semantic search. This will enable the project


Twin Talks 2 and 3, 2020        Understanding and Facilitating Collaboration in Digital Humanities       62/143
                  8         Abgaz. Y et al.

                  to provide a rich, semantically interlinked open data of the image collection
                  which will be consumed by the AI system to build an interactive chatbot. The
                  SW interacts with the CV quadrant to represent the output of the computer
                  vision using ontologies. The collaboration between the SW and SL quadrant
                  also benefits from previous collaborative research that has been carried out in
                  [5, 1, 2, 4]. Furthermore, the DH interacts with the SW in listing and selecting
                  useful ontologies and vocabularies that have been in use in the platform.

                  4.4      The CV Quadrant
                  The Computer Vision (CV) quadrant focuses on extracting as much detailed in-
                  formation from the digital cultural image collections identified by the DH. This
                  quadrant consumes existing computer vision tools [9] to analyse and represent
                  the contents of cultural images and evaluate the accuracy. Although the meta-
                  data is not always available for all the images, the results from the CV will be
                  compared with the existing metadata of the images. The CV quadrant further
                  focuses on building a chatbot by consuming the outputs of the SW quadrant.
                  The CV quadrant also provides several outputs of a computer vision and enables
                  the other team members to understand what a computer vision can output when
                  given historical images of several years old.
                      All the four quadrants have a common goal which is to understand the re-
                  quirements of the other quadrants and making their research question and re-
                  quirements clear to the other members. The circle in the middle of the collabora-
                  tion diagram (Fig. 1) represents the information we all need to have in common
                  about the project. We believe that as the more interaction we have among our-
                  selves and advance the project, we will be able to push the radius of the circle to
                  learn and collaborate with more overlapping interest from the other team mem-
                  bers. Our objective is to push the inner circle to the edges of the outer circle in
                  all directions. Until that happens every member of each team will have a blind
                  spot which requires the consultation of the responsible member of the quadrant.
                  For example, the SL may not fully understand how the SW quadrant works. In
                  such cases, depending on the problem, the SL may rely on the SW expert to
                  carry out the task. Such tasks may remain the blind spot of the SL.
                      Much of the CV work is technical and requires time and effort to fully under-
                  stand it. This could be another blindspot for the team members. We look forward
                  to having training on computer vision and its use of algorithms to familiarise
                  ourselves with the concept. We also look forward to having more technical train-
                  ing on how to build a chatbot system and on how to evaluate its outputs. This
                  will also help us to bridge the gap between the technical part of the project and
                  the research (theoretical) aspects of the project.

                  5        Conclusions and Recommendations
                  In this paper, we explored the collaboration among four experts from different
                  academic backgrounds working together to address long-standing digital human-
                  ities questions. The collaboration from the inception of the project to its current


Twin Talks 2 and 3, 2020          Understanding and Facilitating Collaboration in Digital Humanities    63/143
                  “A picture is worth a thousand words”? - From Project Inception to Results.           9

                  implementation stage is demonstrated. We looked into the major research ques-
                  tions and how the team collaborated to address each of these research questions.
                  Collaboration between technical computer scientists and digital humanities ex-
                  perts is not usually smooth. In this project, we demonstrate how we managed to
                  maintain the collaboration at its best by enabling smooth interaction between
                  the team members. The lessons we have learned from this collaboration and
                  which we continue to learn is that it is very important to start with a sound
                  digital humanities research question and understand the problem from many
                  different sides before rushing to a technical solution. While working with the
                  digital humanists to understand the problem with an example, it is also im-
                  portant to introduce the technical requirements early to ensure that the digital
                  humanists capture all the necessary information which is required to complete
                  the technical solution and most importantly all the requirements to evaluate the
                  system at the end. Finally, we conclude that a picture can indeed say more than
                  a thousand words, however, with the right set of digital tools, controlled vocab-
                  ularies, thesauri and ontologies we should be able to successfully access implicit
                  knowledge via human and machine analysis.


                  Acknowledgements: This research is funded by the Austrian Academy of
                  Sciences under the funding scheme: go!digital Next Generation (GDNG 2018-
                  051). The ChIA project is carried out in collaboration with the Adapt Centre,
                  DCU. The ADAPT SFI Centre for Digital Media Technology is funded by Science
                  Foundation Ireland through the SFI Research Centres Programme and is co-
                  funded under the European Regional Development Fund (ERDF) through Grant
                  # 13/RC/2106.


                  References

                   1. Abgaz, Y., Dorn, A., Piringer, B., Wandl-Vogt, E., Way, A.: Semantic modelling
                      and publishing of traditional data collection questionnaires and answers. Informa-
                      tion 9(12) (2018). https://doi.org/10.3390/info9120297
                   2. Abgaz, Y.M., Dorn, A., Piringer, B., Wandl-Vogt, E., Way, A.: A semantic model
                      for traditional data collection questionnaires enabling cultural analysis. In: Pro-
                      ceedings of the LREC 2018 Workshop ”6th Workshop on Linked Data in Linguis-
                      tics (LDL-2018). Japan, Miyazaki (2018)
                   3. Amelie Dorn, Yalemisew Abgaz, G.K., Diaz, J.L.P.: Harvesting knowledge from
                      cultural images with assorted technologies: the example of the ChIA project. In:
                      The 16th International ISKO conference. Aalborg, Denmark (2020)
                   4. Dorn, A., Abgaz, Y., Wandl-Vogt, E.: Opening up cultural content in non-standard
                      language data through cross-disciplinary collaboration: insights on methods, pro-
                      cesses and learnings on the example of exploreat! In: Krauwer, S., Fišer, D. (eds.)
                      TwinTalks-DHN 2019. Twin Talks Workshop at DHN 2019. Proceedings of the
                      Twin Talks Workshop at DHN 2019, co-located with Digital Humanities in the
                      Nordic Countries (DHN 2019). Copenhagen, Denmark, March 5, 2019. vol. 2365,
                      pp. 82–89 (2019)


Twin Talks 2 and 3, 2020         Understanding and Facilitating Collaboration in Digital Humanities          64/143
                  10       Abgaz. Y et al.

                   5. Dorn, A., Wandl-Vogt, E., Abgaz, Y., Benito Santos, A., Therón, R.: Unlocking
                      cultural conceptualisation in indigenous language resources: Collaborative comput-
                      ing methodologies. In: Soria, C., Besacier, L., Pretorius, L. (eds.) Proceedings of
                      the LREC 2018 Workshop ”CCURL 2018 – Sustaining Knowledge Diversity in the
                      Digital Age”. pp. 19–22 (2018)
                   6. Goikhman, A., Therón, R., Wandl-Vogt, E.: Designing collaborations: Could de-
                      sign probes contribute to better communication between collaborators? In: Pro-
                      ceedings of the Fourth International Conference on Technological Ecosystems for
                      Enhancing Multiculturality. p. 1219–1222. TEEM ’16, Association for Computing
                      Machinery, New York, NY, USA (2016). https://doi.org/10.1145/3012430.3012431,
                      https://doi.org/10.1145/3012430.3012431
                   7. Jackson, S.E.: The consequences of diversity in multidisciplinary work teams.
                      Handbook of work group psychology pp. 53–75 (1996)
                   8. Koch, G., Koch, W.: Aggregation und management von meta-
                      daten      im     kontext     von     europeana.     Mitteilungen    der   Vereini-
                      gung      Österreichischer    Bibliothekarinnen     und     Bibliothekare   70(2),
                      170–178         (Sep        2017).      https://doi.org/10.31263/voebm.v70i2.1776,
                      https://journals.univie.ac.at/index.php/voebm/article/view/2071
                   9. Kubany, A., Ishay, S.B., sacha Ohayon, R., Shmilovici, A., Rokach, L., Doitshman,
                      T.: Semantic comparison of state-of-the-art deep learning apis for image multi-label
                      classification (2019)
                  10. Longhurst, B., Smith, G., Bagnall, G., Crawford, G., Ogborn, M.,
                      Baldwin, E., McCracken, S.: Introducing cultural studies, second edi-
                      tion. Taylor and Francis (2008). https://doi.org/10.4324/9781315834344,
                      http://usir.salford.ac.uk/id/eprint/40855/
                  11. Wasner-Peter, I.: Handbuch kulturportale. online-angebote aus kultur und
                      wissenschaft, hrsg. v. euler, ellen/hagedorn-saupe, monika/maier, ger-
                      ald/schweibenz, werner/sieglerschmidt, jörn. berlin/boston: De gruyter
                      saur 2015. Communications of the Association of Austrian Librarians
                      71(3-4), 558–560 (Dec 2018). https://doi.org/10.31263/voebm.v71i3-4.2176,
                      https://journals.univie.ac.at/index.php/voebm/article/view/2159


Twin Talks 2 and 3, 2020         Understanding and Facilitating Collaboration in Digital Humanities          65/143