Building the Semantic Portal of Italian Divagrafie
Laura Pandolfo* , Lucia Cardone, Luisa Cutzu, Beatrice Seligardi and Giulia Simi
DUMAS, University of Sassari, via Roma 151, Sassari, Italy


                                      Abstract
                                      In this paper, we present the preliminary research activities on building a semantic digital archive for
                                      publishing heterogeneous data about a literary corpus provided by Italian actresses, known as Divagrafie.
                                      This corpus represents a unique collection of cultural data for scholars in Film and Literature Studies
                                      through which they can analyze the phenomenology, characteristics, and historical evolution of the
                                      writings produced by Italian actresses. In this paper, we present the vision behind the development of the
                                      semantic digital archive and explore the potential applications and its expected impacts in the research
                                      community and society.

                                      Keywords
                                      Semantic Web, Digital Archive, Film and Literature Studies


1. Context and Motivation
The availability of cultural heritage data has rapidly boosted the emerging new research area
in Digital Humanities (DH) [1] where an increasing number of scholars are dealing with
new computational methods developed and applied to literary corpus for solving problems in
humanities and social sciences. Recently, DH research has shifted the focus to providing the
user with integrated tools for solving research problems in interactive ways [2]. One of the
characteristics of this kind of systems is to use computational methods for solving humanities
research questions by using large datasets and applying on them the so-called "distant reading”
approach [3], namely a set of different techniques, such as novel statistical methods, sentiment
analysis, topic modeling and network analysis. The main benefit of using the distant reading
approach comes from the fact that it opens new horizons for computational literary studies,
without the hard effort required by classical humanistic computer research, which often needs
rigorous coding and document annotations. However, this approach has several critical issues, as
identified by Ciotti [4, 5]. One shortcoming lies in the fact that most of the applied computational
methods are independent from the context, while humanities and literary data need to be heavily
contextualized. Another important critical point concerns the interpretation of text, which is
usually an intentional process. Statistical computational methods are hardly able to detect the

IPS-RiCeRcA-SPIRIT 2022: 10th Italian Workshop on Planning and Scheduling, RiCeRcA Italian Workshop, and SPIRIT
Workshop on Strategies, Prediction, Interaction, and Reasoning in Italy.
*
  Corresponding author.
$ lpandolfo@uniss.it (L. Pandolfo); cardone@uniss.it (L. Cardone); lcutzu@uniss.it (L. Cutzu); bseligardi@uniss.it
(B. Seligardi); gsimi@uniss.it (G. Simi)
 0000-0002-5785-5638 (L. Pandolfo); 0000-0003-0770-6324 (L. Cardone); 0000-0002-9400-0866 (B. Seligardi);
0000-0002-2206-2945 (G. Simi)
                                    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)
true interpretation since the meaning of a word is usually determined by the attribution of
sense and meaning by the author and by the reader.
   Semantic Web (SW) [6] technologies and Linked Data [7] can overcome these issues by
enriching the distant reading approach of new methods able to capture the semantic nature of
literary texts [8]. Therefore, cultural heritage has become an active area of application of SW,
where cultural content and metadata are available openly for research and public use based
on collections in museums, libraries, archives, and media organizations [2, 9]. In the last two
decades, large amount of data has been aggregated in huge national and international portals,
libraries and repositories such as Europeana 1 by forming a significant part of DBpedia 2 and
Wikidata 3 . The concept of ontology [10] plays a central role, since it is commonly used as a
sort of schema capturing knowledge about a specific domain via providing relevant concepts
and relations between them. Currently,several examples of ontology-based digital archives and
libraries in the humanities have been reported [11, 12, 13, 14, 15, 16].
   This paper introduces our ongoing research work on developing an ontology-based archive
named WOmen Writing around the camera (WOW) which will collect semantic data about rele-
vant writings produced by Italian actresses (“Divagrafie”) focusing on the dynamics of exchange
between writing, acting performance and the construction of the star image. This research idea
builds-on and extends the “Drawing a Map of Italian Actresses in Writing” (DaMA) funded by
PRIN 2017 4 which aims at investigating the extension, phenomenology, characteristics, and
historical evolution of the writings produced by Italian actresses. The main goal is to build a
semantic portal containing different resource materials related to Italian actresses’ visual and
self-representation history able to bring to light and interrogating together different kinds of
documents (mainly photographs and texts) with a defined set of investigation methods with the
help of DH tools, particularly those based on SW technologies and distant reading methods. The
paper is organized as follows. Section 2 describes our current research progress and outlines
the planned phases related to the development of the WOW semantic archive, while Section 3
concludes the paper by discussing the potential applications and expected impacts resulting
from this line of research both for the academic point of view but also society in general.


2. The WOW Semantic Archive
The idea to develop the WOW Semantic Archive stems from the in-progress research project
DaMA which aims at investigating the extension, phenomenology, characteristics, and his-
torical evolution of the writings produced by Italian actresses, focusing in particular on the
dynamics of exchange between writing, acting performance and construction of the star image.
Defined as Divagrafie [17], such writings represent a multifaceted and interesting corpus of
texts traditionally overlooked by the academic community. The important contribution to the
understanding of stardom provided by this kind of self-narratives or fiction writings has been
only partially acknowledged within the field of star studies; similarly, the field is also lacking

1
  https://www.europeana.eu/en
2
  https://www.dbpedia.org/
3
  https://www.wikidata.org/wiki/Wikidata:Main_Page
4
  https://www.damadivagrafie.org/
an in-depth analysis dedicated to the relationship between the writings of a specific actress
and the construction of her own star image. This ongoing project is revealing a still unknown
territory and has so far classified a wide number of texts, including 80 autobiographies written
by 47 actresses. This literary corpus – which we are intended to focus on – represents a unique
testing ground for a convergent methodology that applies DH methods, SW technologies and
the most recent advances in the fields of Film and Media Studies, Literature and Gender Studies.
These volumes have been analyzed so far by the DaMA team with a close reading approach
focusing on paradigmatic examples and eliciting a variety of recurring topics [18, 19, 20], with a
particular focus on the interconnections with the perspective of performing and actors’ studies,
stardom and celebrities studies and by using also the videographic analysis tools.
   By following and expanding the multidisciplinary approach already explored by the DaMA
research, our current and future research work focuses on increasing the actual corpus and
applying on it SW standards and DH tools. Below the main phases of our methodological
approach:
   1. Recognition of DaMA project’s outcomes and integration of new materials. All
      the materials collected and analyzed by the DAMA research project team will be subjected
      to a further and extended analysis in order to provide a first taxonomy describing the
      corpus. During this phase, new documents and resources will be integrated into the
      original corpus, in particular we we will focus on the retrieval of materials preserved
      in the Elisabetta Catalano archive (Rome), the Museo di Fotografia Contemporanea -
      MuFoCo archive (Cinisello Balsamo, MI), the Museo del Cinema archive (Turin), the
      Cinemazero archive, the Luisa Di Gaetano archive (Rome), Archivia - Casa Internazionale
      delle Donne (Rome). In addition, we intend to study cinema and cultural periodicals in
      order to verify the circulation and the incidence of the work of female photographers in
      the national press.
   2. Conceptualization and formalization. The main goal of this activity is to formalize the
      acquired knowledge in terms of concepts and relations with respect to specific ontological
      schema and models detected in this phase. In this domain, the use of ontologies provides
      a range of benefits for the users, e.g., in searching and browsing by concept rather than
      string-based only. For example, the photographs that Elisabetta Catalano has taken of
      a variety of actors and actresses, writers and directors can be put in relationships with
      all the excerpts in the Italian actresses’ autobiographies in which the relations with the
      cultural field strongly emerge as a symptom of overcoming the stereotype of actresses as
      a pure "not thinking" beautiful and fashioned body. As we can see in Figure 1, Elisabetta
      Catalano’s photograph of Monica Vitti talking to Andy Warhol can be intertwined with
      passages in her autobiographies related to art as a means of self-expression and with
      other several entities and concepts in the corpus.
   3. Semantic annotations and analysis of the corpus. In this phase, ontological schema
      and models will be used to guide the semantic annotations of all the corpus materials (texts
      and images). During this phase, the corpus will be enriched with metadata describing,
      for example, references that link the content to specific concepts. Moreover, a variety
      of literary analysis based on distant reading approaches will be applied to the corpus
      materials. In particular, we will explore typical computational linguistics methods based
      Figure 1: Example of entities and relationships in the examined corpus.


      on Natural Language Processing techniques in order to identify recursive narrative
      patterns as well as similar characteristics of the texts which will allow the recognition of
      an attantial typology specifically linked to Divagrafie.
   4. Development of the semantic archive. The main goal of this last phase will be the
      development of the WOW semantic archive. It will provide the researchers with a set of
      powerful and efficient tools that can be used to query, analyze and study the data in order
      to make possible intertwine and connect different data. The interdisciplinary approach
      will be made visible and easily traceable by a digital user interface able to link different
      objects and produce new sets of knowledge. This will be particularly useful for scholars
      and researchers of different fields, such as History of Cinema, History of Photography,
      Visual Studies, Literature, Contemporary History and so on. The semantic archive will be
      the hub through which not only find materials to be linked to other studies and researches,
      but also to find new ways to analyze and study them with innovative approaches. In this
      phase, we intend to investigate some automatic techniques for ontology population, such
      as those presented in [21, 22]. All the features of the semantic archive will be integrated
      into an easy-to-use graphical interface which will provide the visualization of the data in
      various formats, such as graphs, interactive maps, timelines, facets, etc.


3. Potential Applications and Expected Impacts
The development of the WOW Semantic Archive could have significant potential applications
within the national and international research communities involved in the process. In fact,
the archive will make accessible a wide corpus of materials related to Italian Actresses which
crosses a wide range of studies also in the international academic communities: Film Studies,
Photography Studies, Women’s Studies and Literature Studies, and so on. To our knowledge,
this is the first time that such a corpus is being created following the unambiguous, rigorous,
consistent and well-documented practices provided by the DH approach. Moreover, this is
the first time that computational methods and SW technologies will be applied to this kind of
document resources. The data integration feature provided by SW technologies will allow the
connection between data contained in the WOW Semantic Archive to data included to other
international archives, such as the European Film Gateway 5 , by increasing the international
impact.
   The semantic archive will be useful also for upper-secondary teachers and students, curators
of film-related events, film/photography/literature enthusiasts, and in general for whoever needs
to retrieve the data and the researches included in it. In addition to the ambition to fill a gap in
research and reflection on the area of study, namely the analysis of the relationship between
women, photography and cinema from a feminist perspective, which in many Western countries
has already been addressed, this work has another ambition, more complex and therefore even
more important and stimulating: this research work aims at contributing to the debate and
to the cultural framework of Italy, in which gender issues, women’s emancipation, feminist
heritage, women’s art are still not sufficiently considered as a fundamental part of the cultural
heritage of the nation, and therefore as a topic in school and university curriculum. As for its
social impact, it will contribute to the growing interest in those cultural dynamics which are
affected and/or shaped by gender issues as well as promoting a deeper social awareness of the
cultural role of women in Italian society.
   As for its methodology, the actual collaboration among different scientific areas, such as
DH, Computer Science, Cinema Studies, Photography, Visual Studies, Gender Studies, Literary
Theory, will constitute an example of an integrated and multi-layered methodology, offering
itself as a possible benchmark for potential future projects. As for its objectives and expected
potential impacts, this work is in line with the research goals and targets defined by the National
Research Program (PNR) 2021-2027 and by the "Cultural Heritage" specific intervention area of
Horizon Europe. In particular, it will mainly impact the cluster “Humanistic culture, creativity,
social transformation, society of inclusion” of the PNR, specifically the sub-category “Digital
preservation and conservation of cultural heritage”. In fact, according to this sub-category
research line, the priority should be given to the implementation of effective semantic modeling
technologies, also through the construction of ontologies wherever necessary, which allow
an effective aggregation of different information levels and types of data, in order to avoid
redundancies or lack of data. The research should also include a shift from traditional databases
to SW databases, in order to achieve interoperability between resources and set up the field for
the transition to Big Data. All these research aspects will be considered within the development
of the WOW Semantic Archive that will have an expected impact in supporting the digital
transformation of the cultural sector, following the suggested focuses by the PNR related to
Digitization and Valorization of the Cultural Heritage, but also strengthening the social inclusion,
by reducing gender inequalities.


5
    https://www.europeanfilmgateway.eu/it
References
 [1] E. Gardiner, R. G. Musto, The Digital Humanities: A Primer for Students and Scholars,
     Cambridge University Press, 2015.
 [2] E. Hyvönen, Using the Semantic Web in digital humanities: Shift from data publishing to
     data-analysis and serendipitous knowledge discovery, Semantic Web 11 (2020) 187–193.
 [3] F. Moretti, Distant reading, Verso Books, 2013.
 [4] F. Ciotti, Modelli e metodi computazionali per la critica letteraria: lo stato dell’arte (2017).
 [5] F. Ciotti, Distant reading in literary studies: a methodology in quest of theory, Testo e
     Senso (2021) 195–213.
 [6] T. Berners-Lee, J. Hendler, O. Lassila, The Semantic Web, Scientific American 284 (2001)
     34–43.
 [7] C. Bizer, T. Heath, K. Idehen, T. Berners-Lee, Linked Data on the Web (LDOW2008), in:
     Proceedings of the 17th international conference on World Wide Web, 2008, pp. 1265–1266.
 [8] L. Pandolfo, L. Pulina, ARKIVO Dataset: A benchmark for ontology-based extraction tools.,
     in: Proceedings of the 17th International Conference on Web Information Systems and
     Technologies, WEBIST 2021, October 26-28, 2021, SCITEPRESS, 2021, pp. 341–345.
 [9] L. Pandolfo, S. Spanu, L. Pulina, E. Grosso, Understanding and modeling visitor behaviours
     for enhancing personalized cultural experiences, Int. J. Technol. Hum. Interact. 16 (2020)
     24–38.
[10] N. Guarino, D. Oberle, S. Staab, What is an ontology?, in: Handbook on Ontologies,
     Springer, 2009, pp. 1–17.
[11] G. Adorni, M. Maratea, L. Pandolfo, L. Pulina, An ontology for historical research doc-
     uments, in: Web Reasoning and Rule Systems - 9th International Conference, RR 2015,
     Berlin, Germany, August 4-5, 2015, Proceedings, volume 9209 of Lecture Notes in Computer
     Science, Springer, 2015, pp. 11–18.
[12] G. Adorni, M. Maratea, L. Pandolfo, L. Pulina, An ontology-based archive for historical
     research, in: Proceedings of the 28th International Workshop on Description Logics,
     Athens,Greece, June 7-10, 2015, volume 1350 of CEUR Workshop Proceedings, CEUR-WS.org,
     2015.
[13] L. Pandolfo, L. Pulina, M. Zieliński, Towards an ontology for describing archival resources,
     in: Proceedings of the Second Workshop on Humanities in the Semantic Web (WHiSe
     II) co-located with 16th International Semantic Web Conference (ISWC 2017), Vienna,
     Austria, October 22, 2017, volume 2014 of CEUR Workshop Proceedings, CEUR-WS.org,
     2017, pp. 111–116.
[14] L. Pandolfo, L. Pulina, M. Zieliński, ARKIVO: an ontology for describing archival resources,
     in: P. Felli, M. Montali (Eds.), Proceedings of the 33rd Italian Conference on Computational
     Logic, Bolzano, Italy, September 20-22, 2018, volume 2214 of CEUR Workshop Proceedings,
     CEUR-WS.org, 2018, pp. 112–116.
[15] L. Pandolfo, L. Pulina, M. Zieliński, Exploring semantic archival collections: The case of
     Piłsudski Institute of America, in: Digital Libraries: Supporting Open Science - 15th Italian
     Research Conference on Digital Libraries, IRCDL 2019, Pisa, Italy, January 31 - February 1,
     2019, Proceedings, volume 988 of Communications in Computer and Information Science,
     Springer, 2019, pp. 107–121.
[16] L. Pandolfo, L. Pulina, Building the semantic layer of the Józef Piłsudski digital archive
     with an ontology-based approach, International Journal on Semantic Web and Information
     Systems (IJSWIS) 17 (2021) 1–21.
[17] M. Rizzarelli, L’attrice che scrive, la scrittrice che recita. per una mappa della ‘diva-grafia’,
     Vaghe stelle. Attrici del/nel cinema italiano. Arabeschi, edited by Lucia Cardone, Giovanna
     Maina, Stefania Rimini, and Chiara Tognolotti 10 (2017) 366–371.
[18] G. Simi, L’occhio che palpita. monica vitti e gli scritti sull’arte, Cinergie–Il Cinema e le
     altre Arti (2021) 153–166.
[19] M. Rizzarelli, Il doppio talento dell’attrice che scrive. per una mappa delle “divagrafie”,
     Cahiers d’études italiennes (2021).
[20] C. Tognolotti, Una diva fragrante. l’immagine divistica di sophia loren nei libri di ricette
     (2019).
[21] L. Pandolfo, L. Pulina, G. Adorni, A framework for automatic population of ontology-based
     digital libraries, in: AI*IA 2016: Advances in Artificial Intelligence - XVth International
     Conference of the Italian Association for Artificial Intelligence, Genova, Italy, November
     29 - December 1, 2016, Proceedings, volume 10037 of Lecture Notes in Computer Science,
     Springer, 2016, pp. 406–417.
[22] ADnOTO: a self-adaptive system for automatic ontology-based annotation of unstruc-
     tured documents, in: Advances in Artificial Intelligence: From Theory to Practice - 30th
     International Conference on Industrial Engineering and Other Applications of Applied
     Intelligent Systems, IEA/AIE 2017, Arras, France, June 27-30, 2017, Proceedings, Part I,
     volume 10350 of Lecture Notes in Computer Science, Springer, 2017, pp. 495–501.