An AI-based Approach and Platform for the Preservation and
Exploitation of Knowledge on the History of Computing
(short paper)
Stefano Ferilli 1 Liudmyla Matviichuk 2 and Carla Petrocelli 3
1
  Università di Bari – DIB, Via E. Orabona 4, Bari, 70125, Italia
2
  Department of Informatics and Computing Tools, T. Shevchenko National University “Chernihiv Collegium”,
53, Hetmana Polubotka Str., Chernihiv, Ukraine
3
  Università di Bari – DIRIUM, Piazza Umberto I 1, Bari, 70121, Italia


                   Abstract
                   There is an urgent need for preserving and making available the knowledge related to the
                   history of computing, for research and education purposes. This is a peculiar kind of Cultural
                   Heritage, since it tightly mixes hardware, software, documental and even immaterial heritage.
                   The interlinks among these items and their context is fundamental to properly understand
                   them and their role. Advanced AI techniques can support this vision and open unprecedented
                   opportunities to the researchers, practitioners and hobbyists. We are pursuing these objectives
                   in a project based the GraphBRAIN platform for Knowledge Graphs management.

                   Keywords 1
                   Knowledge Graphs, History of Computing, Knowledge Representation and Reasoning

1. Introduction & Motivations
    The word ‘computing’ refers to the science and technology of information processing and the
industry dedicated to these topics. Everything that revolves around the modern ‘computer’ is an
artifact, just as an archaeological find. The computer embodies in its own technical nature the same
characteristics of change and the same speed of technological innovations that have led to its
realization: in addition to its intrinsic design idea, it becomes a historical source as a ‘cultural object’.
From this connection of the technical object with its own context comes a first declination of the
relationship between history and computer, the ‘history of computer science’, understood as a study of
the evolution of computing machines and the automatisms for data processing and the operations
performed on them. On the other hand, from the perspective of human history, technology has very
quickly permeated, and contributed to the evolution of, our way of life; however, technological
incarnations, especially ‘modern’ digital ones (hardware, software, applications), have such a short
life cycle that their rapid obsolescence can cause loss of their knowledge. The generation of inventors/
pioneers of computer artifacts is gradually fading away, causing treasures of know-how to sink into
oblivion. It is therefore urgent to react and create a heritage in which computer science is presented in
its entirety and not necessarily linked to other sciences (mathematics, physics, chemistry), to serve as
a permanent reference and a core of resources to learn, to understand, to see and to wonder, and to
witnesses the importance it has in our society. Its roots must be known, and proper tools must be
provided to understand it and the keys to interpret it.
    Usually, catalogs related to the history of computing are just lists of items belonging to a given
collection as if they were built based on inventories. In addition, they mainly focus on hardware,
1
  1st Italian Workshop on Artificial Intelligence for Cultural Heritage (AI4CH22), co-located with the 21st International Conference of the
Italian Association for Artificial Intelligence (AIxIA 2022). 28 November 2022, Udine, Italy.
             EMAIL: stefano.ferilli@uniba.it (A. 1); matviychuk2012@gmail.com (A. 2); carla.petrocelli@uniba.it (A. 3)
             ORCID: 0000-0003-1118-0601 (A. 1); 0000-0002-2046-6153 (A. 2); 0000-0002-6009-3806(A. 3)
           © 2022 Copyright for this paper by its authors.
           Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
           CEUR Workshop Proceedings (CEUR-WS.org)
which is visible and tangible, neglecting the invisible and volatile heritage of software and technical
documents, that are essential both by themselves and for understanding the hardware. Even should the
cataloging form include the most detailed and comprehensive information that characterizes the
object, it would be just a digitized mirror copy of the old paper archives 2. In such a setting, the use of
the most cutting-edge technology involved in the procedures of cataloging an object does not expand
the knowledge of the object itself, it just makes it easier to consult the catalog.
    The heritage also remains strongly localized: the information, however correct, rich, and
exhaustive, can be consulted only limited to the database it belongs to. This thwarts the possibility of
having a wider knowledge around the described object, based on relationships that could for example
explain if there are similar pieces produced by another company, designed by another research group,
or related to other objects, due to similar characteristics. This can be overcome only through a
transition to a unified database, in which all the information relating to the object belonging to the
historical heritage is collected, and from which all this information can be derived.
    From the computational technologies that have revolutionized the field of archival sciences, we
expect a radical change that enables a more effective management of cultural heritage: sharing
information (which is not simply a union of catalogs) implies a transformation of the represented
object as part of a system of knowledge around it, that enriches it and connects it to the information
coming from its content and context. For instance, the object ‘punched card’, whose localized
cataloging can provide the simple information that it is an element used in textile machines from
18013, if extended with relations about its other uses over time, can be connected to mechanical
musical instruments4, to mechanical calculating machines 5, to the tabulators used in 1890 for the US
census [8], and even to the first electronic computers and programming languages. This transforms
punched cards from simple paper elements into media, used both to store information and to transmit
data, but also into a tool to switch to a computer ‘the code’ which allows a transformation of the data
in the expected results. In our example the object ‘punched card’ is connected to objects of other
types: a craftsman (Jacquard), a document (the user guide for the operation of the mechanical piano),
a scientist (Charles Babbage), a company (IBM producing tabulators), an electronic computer
(ENIAC), programming languages, etc. These paths make explicit the roles played by the protagonists
in the evolutionary stages of the object itself, but also outline the historical steps that led to the change
of certain paradigms in the history of computing. This new perspective of sharing and inter-relating
data contributes to the growth of ‘knowledge’ related an object, so as to describe it in all its
complexity, and radically changes the approach to querying the object being cataloged, but also the
storytelling associated with it.
    This requires a change of paradigm with respect to the past. We believe it is necessary to:
     1. Deconstruct the traditional record-based approach and move to a description in which all the
         entities involved in a description ‘live’ with their own dignity and can be related to each
         other, rather than being just field values (author, title, etc.) in the records.
     2. Widen the scope of the description from a fixed set of fields describing each item to a larger
         and more variable set, including aspects that were so far neglected by the research and
         practice, such as physical, content, context, and even usage aspects.
     3. Enable advanced support provided by AI tools to help the different kinds of users in carrying
         out their activities and accomplishing their tasks, in a personalized and pro-active way.
    We also believe that this requires different solutions than those currently proposed in the literature,
that may boost the effectiveness of data management so as to support the needs of different kind of
users, providing them new possibilities for data exploitation.
    This paper describes our project aimed at proposing a vision for long-term preservation and
advanced exploitation of knowledge on material and immaterial cultural heritage, specifically in the
field of the history of computing, and methodological and technological solutions to implement it.

2
  Even considering the national standards for the management of technological and scientific archives (e.g., see [6]), the impression is that
these tools are used with the sole purpose of making data usable without making explicit the complex network of knowledge they preserve.
3
  Joseph Marie Jacquard made his loom 'programmable', making it possible to create the pattern of a fabric on the basis of a pattern stored on
a support (list of punched cards) that can be replaced and always precisely reproduced [4]. [5]
4
  Under a sort of piano keyboard there was the punched board, and under the latter, several strings of the musical instrument. A mechanism
dragged the punched board under elements that allowed the percussion of the strings only at points where the board had holes.
5
  One thinks of Charles Babbage's Analytical Engine, well described in the 1842 article by Ada Lovelace where, in addition, the first
program to make the machine calculate Bernoulli's progression numbers is presented [3].
Compared to previous work, here we propose an expanded, ‘holistic’ data schema, and a more
systematized list of kinds of automated reasoning to be applied to the available knowledge.

2. Related Works
    The main projects undertaken in the past in the directions we envisaged, have tried to overcome
the reported limitations, without however fully responding to the needs set out above. This is the case
of the French ‘collaborative and participatory’ project for the realization of the Musée de
l’Informatique et du Numérique (MINF), launched following an agreement between academic, socio-
economic and associative structures, signed at the symposium Vers un Musée de l'Informatique et de
la Société Numérique en France, held in 2012 [22]. On this occasion, it also emerged the urgency of
keeping track of tools related to computer sciences of which, given their specificity in terms of
identification and speed of obsolescence, there is a risk of losing knowledge In order to define an
identifying historical heritage, the project was conceived as a network of physical spaces for the
preservation, dissemination and promotion of IT tools. These spaces, distributed in different Lieux on
the French territory, are however always linked to temporary or permanent exhibitions of museums
located on the national territory, which are therefore not shared outside the country’s borders. More
recent projects concern digital archives that deal with the cataloging of artifacts of artistic, historical,
and cultural value and have integrated their material using experiential feedback, the result of
interactions on social media platforms. Among these, the one with European relevance is SPICE
(Social Participation, Cohesion, and Inclusion through Cultural Engagement), which aims to produce,
collect, interpret and archive the proposals, reactions and responses of users interacting with these
heritages, with the objective of capturing citizens’ calls to rethink the nature of the computational
infrastructures that support data management [1,4].
    On the technological side, data networking is known in AI for being the core of knowledge. So, we
call for a step up from the Data Base (DB) perspective to the Knowledge Base (KB) – more
specifically, Knowledge Graph (KG) – one. The research on KGs carried out in Knowledge
Representation (KR) proposed solutions for representing and storing knowledge that have departed
from the mainstream solutions for DBs. The established representation standard for formal ontologies
is the Ontology Web Language (OWL), and the associated data storage technology, triplestores,
adopts the RDF graph model, based on triples (Subject, Predicate, Object) of atomic (Uniform
Resource Identifiers – URIs – or literal) values. In the DB community, significant success has been
obtained by a new graph-based NoSQL technology, based on the Labeled Property Graphs (LPG)
model, that allows to associate sets of attribute-value pairs and labels to nodes and arcs. Thus, the
LPG model is more expressive than (and incompatible with) the RDF one. We believe that data
representation and storage must rely on state-of-the-art DB technology, in order to ensure
optimization and efficiency in data storage and handling, and that the research in KR may provide
solutions for effectiveness in data usage. So, we propose to adopt LPG-based DBs for data storage
and basic handling, and formal ontologies as data schemas. Some works tried to investigate this
combination, but they mostly focus on applying OWL solutions to LPGs, at the cost of not fully
exploiting the power of LPGs ([16-19]) or of proposing non-standard extensions of RDF [20,21]. We
call for an LPG-centric approach that can fully exploit the features of this model. A solution for this is
the GraphBRAIN framework [14], and associated tools for schema and instance handling [15].

3. Technological Platform
    For our project we adopted GraphBRAIN, a general-purpose KB management system aimed at
covering all stages and tasks in the lifecycle of a KB, from knowledge acquisition, to knowledge
organization, to knowledge exploitation. It brings to cooperation an graph DBMS for efficiently
handling, mining and browsing the individuals, with an ontology level that defines the DB schema. As
in relational DBs, and differently from standard KGs, the schema is kept separate from the data. This
allows to superimpose different ontologies/schemas on one graph, representing different views on the
same data. Some classes and relationships may appear in different ontologies, possibly with different
attributes, in order to reflect different perspectives on them. This allows cross-fertilization among, and
knowledge reuse across, different domains: individuals of shared classes act as bridges, allowing the
users of a domain to reach information coming from other domains. The ontologies are built and
maintained by GraphBRAIN's administrators, while instances are fed into the KB collaboratively by
the users, or by automatic knowledge extraction from documents and other kinds of resources (e.g.,
the Internet). The functionalities of GraphBRAIN are exposed as services, and external applications
can use GraphBRAIN through an API ensuring that all accesses to the DB and operations on its
content are compliant with the data schema.
     The data are stored in Neo4j [10], that implements the LPG model: nodes represent entity
instances and arcs represent instances of binary relationships, whose type is specified by their labels,
and whose attributes are specified by their properties. Neo4j is schema-less; in GraphBRAIN the
ontologies allow to associate a clear semantics to the graph items, and enable high-level reasoning on
the available knowledge. They express what the DB can store and how it is structured, so that only
data that are compliant to the ontologies may be added to the graph. They drive and support all
functionalities: KB creation and enrichment; advanced tools for searching and browsing the KB;
automated reasoning, mining, analysis and knowledge extraction tools that may be used interactively
by end users or provided as services to other systems for obtaining selective and personalized access
to the stored knowledge. While the ontologies are described in a proprietary XML format purposely
designed for the LPG model, GraphBRAIN can also import OWL ontologies and/or individuals,
export its KGs to OWL, so as to allow application of existing Semantic Web tools on them, or publish
KB content as linked open data (LOD) [13], to make it interoperable with other resources.
    GraphBRAIN can manage attachments for each instance. In this way it also acts as an archive,
whose content is indirectly organized according to formal ontologies, and thus may foster
interoperability with other systems. Finally, users may add comments, or approve/disapprove, each
entity or relationship instance, and even each single attribute value thereof. Using the comments, the
users may also provide suggestions to improve and extend the ontology. Through the
approval/disapproval mechanism, the system may establish a trust mechanism for the users that
supports ‘distributed’ quality assurance on the content of the KB. Users are encouraged to provide
high-quality knowledge, because using a combination of their number of contributions and trust they
are assigned ‘credits’ that they may spend in using advanced features provided by GraphBRAIN.
Interactions of users are tracked in order to build models of their preferences to be used for
personalization purposes.
    A Web application was developed to allow users interaction with the KGs. It provides form-based
interfaces (automatically generated from the ontology specification) for feeding or querying instances
of entities and relationships in the KB6, and a graph view where a selected portion of the instances can
be graphically displayed and subsequently explored, expanded or shrinked, and the details of
instances can be shown. This is useful to browse the available knowledge without a pre-defined goal
in mind, but letting the data themselves drive the search. This also enables serendipity in information
retrieval, since the users may find unexpected information that is relevant to their information needs.
The displayed portion of the graph can be selected based on the result of a specific user query or
automatically as a connected neighborhood of the most relevant nodes or, if a user model is available,
based on statistics collected about his previous interaction with the system, the starting nodes may be
those more related to his interests, preferences, aims, background, etc. The possibility of translating
selected portions of the graph into natural language is also envisioned.
    GraphBRAIN can export its KGs (ontologies and instances) to several different formats, enabling
several kinds of automated reasoning, including:
      Associative reasoning (finding indirect connections between items, extracting personalized
         and relevant portions of the graph, etc.), carried out by the graph DB manipulation language
         and libraries;
      Ontological reasoning (inheritance, consistency, etc.), carried out by OWL reasoners;
      Logical multi-strategy (deduction, abduction, abstraction, induction, argumentation,
         probabilistic inference, analogy) reasoning carried out by a Logic Programming-based
         inference engine;


6
    A demo Web Application is available at http://193.204.187.73:8088/GraphBRAIN/
      Analytical reasoning (clustering items, spotting anomalous or exceptional situations,
       identifying regularities, assessing node relevance or centrality, predicting links, etc.)
   Some of the underlying algorithms are reused from the literature; others have been extended or
purposely developed. Specific AI research is being carried out to develop an integrated framework in
which all these kinds of reasoning can be tightly combined, not just exploited separately.
   Figure 1 shows the form-based and graph-browsing sections in the Web application.


Figure 1: GraphBRAIN Web application interface

4. Data Schema / Ontology for the History of Computing
   Following the work in [23], we propose here an approach to knowledge representation that
considers and brings to cooperation many different aspects of the cultural heritage items:
         Formal, including the metadata used in traditional records;
         Physical, including materials, processing and mechanics;
         Content, of various kind: textual (if applicable), visual, logical, conceptual (interested in
            the meaning conveyed by the items);
         Context, adding information that is external to the cultural heritage items proper, but that
            may be useful or relevant to properly understand it;
         Lifecycle, including process and usage data, useful for personalization purposes.
   We call it a holistic description approach. The classes in our ontology include:
         Award: any kind of recognition to persons, companies, devices, documents, or
             components (including educational attainments, prizes and records);
         Collection: any conceivable grouping of items (e.g., groups of persons, series of
             documents, families of devices or components);
         Organization, including companies and institutions;
         Part: a part, useful or needed to build a Device but not providing a high-level (i.e.,
             perceivable or meaningful for a final user) functionality on its own. It has many
             subclasses, including electronic or electric or mechanical components, boards, etc.;
         Configuration: a group of Devices, relevant because typical or determined in order to
             satisfy specific needs (e.g., a configuration of devices for desktop publishing);
         Device: an artifact having some kind of use at the human level of interaction. Among its
             many subclasses, the most relevant concern computers, calculators, peripherals, etc.;
         Document, including printable, audio, video and multimedia types, each with a
             corresponding hierarchy of subclasses;
         Event (conferences, fairs, shows, lectures, etc.);
         IntellectualWork: the original result of an intellectual effort, relevant for methodological
             or practical purposes (including algorithms, approaches, inventions, programming
             languages, disciplines, technologies, theorems, theoretical models, etc.);
         Item: a specific, identifiable specimen of a (mass-produced) object (e.g., a device or
             component or document or software or system);
         Package: a specific packaging of a Device (or of a set of devices sold together);
         Person: reporting personal data about persons;
         Place It is the root of a hierarchy currently made up of several subclasses, describing
             geographical, administrative, and other kinds of places;
            Software (with a hierarchy of subclasses, such as Development, Educational, Embedded,
             OfficeAutomation, OperatingSystem, Videogame);
          System: a group of Devices that is functional only as a whole (different from a
             Configuration, where at least one of the Devices would be functional if taken alone).
   Moreover, classes Category and Word allow, respectively, to conceptually or lexically tag all
other items, and to connect them semantically, since they are interlinked in the KB.
   Sample relevant relationships include:
     Document.concerns.{Concept,Component,Device,Document,Person,Software,...}
     Device.wasIn.{Event,Place}
     Device.clones.Device, Component.clones.Component, Software.clones.Software
     Software.compatibleWith.Software, Device.compatibleWith.Device
     Software.requires.Software
     {Item,Component,Device,Document,Person,Software}.belongsTo.Collection
     {Person,Organization}.owns.Device
     Person.developed.{Component, Device, Document, IntellectualWork}
     Component.mayReplace.Component
     Person.interactedWith.{Device,Person,System}
     Word.describes.{Concept,Component ,Device,Document,Person,Software,...}
   The resulting graph will allow indirect, non-trivial connections between the represented items.
E.g., it might allow to discover that a person who patented a component was at the same show as an
employee of a company using that component in a device, which might explain why that company
used that component. Other examples of opportunities provided by this conceptualization include the
possibility of recording anecdotes told by the original players of the computer revolution, or technical
information that can be precious to restore items or to run obsolete software, which cannot be
expressed in existing ontologies designed for other kinds of cultural heritage.

Table 1
Current content of the Knowledge Base on the history of computing
                Data      Instances Attribute        Data     Instances        Attribute      Total
             points (e)       (e)    values (e) points (r)        (r)          values (r)
 Computing       8565          1699      6866         3747        2080             1667        12312
  Overall      2424578       336617    2087961       538118      496679           41439       2962696

   Table 1 reports statistics on the current content of our KB, by type of information. The history of
computing section, built collaboratively, includes 1699 entity instances and 2080 relationship
instances, described by 6866 and 1667 attribute values, respectively. The rest, which is the vast
majority, consists of context information (concepts, words, places, etc.) added partly automatically
and partly collaboratively. This includes the WordNet lexical ontology [12], the standard part of the
Dewey Decimal Classification (DDC) system [11], the ACM Computing Classification System (CCS)
[24], and the IEEE thesaurus and taxonomy [25].

5. Conclusions and Future Work
    We stressed the urgent need for preserving and making available the knowledge related to the
history of computing, for research and education purposes. This is a peculiar kind of Cultural
Heritage, posing several challenges since it tightly mixes hardware, software, archival/bibliographic
and even immaterial items. Storing the interrelationships among the items and between items and their
context is fundamental to properly understand them. KRR techniques from AI can support this vision
and open unprecedented opportunities to the researchers, educators, practitioners and hobbyists. We
started a preservation project based on the GraphBRAIN platform for Knowledge Graphs
management. In this paper we described its setting and the functions currently provided. Ongoing and
future work aims at expanding the KB and the set of AI-based functions provided in the platform.
6. References
[1] E. Daga, L. Asprino, et al., Integrating Citizen Experiences in Cultural Heritage Archives:
     Requirements, State of the Art, and Challenges in Journal on Computing and Cultural Heritage,
     vol. 15. n. 1, 2022, pp. 1-35.
[2] L. Heide, Shaping a technology: American punched card systems 1880-1914 in IEEE Annals of
     the History of Computing, vol. 19, no. 4, 1997, pp. 28-41, 1997.
[3] L.F. Menabrea, Sketch of the Analytical Engine Invented by Charles Babbage, Esq., trans.
     Augusta Ada Byron King, Countess of Lovelace, in Scientific Memoirs, vol. 3, 1843, pp. 666-
     731.
[4] D. Otero, P. Martin-Rodilla, J. Parapar, Building Cultural Heritage Reference Collections from
     Social Media through Pooling Strategies: The Case of 2020’s Tensions Over Race and Heritage.
     Journal on Computing and Cultural Heritage, vol. 15, n. 1, 2022, pp. 1-13
[5] C. Petrocelli. The Art of Weaving a Code: The Jacquard Loom, the Analytical Engine, and
     Women’s Work. In E. Vavarella (ed.), rs548049170_1_69869_TT, pp. 110-117, Mousse
     Publishing, Milano, 2020.
[6] R. Rojas, U. Hashagen, The ENIAC: History, Operation and Reconstruction in VLSI in The First
     Computers: History and Architectures, MIT Press, 2002, pp.121-178
[7] J.E. Sammet, Programming languages: history and future. Communation of the ACM, 15(7):
     601-610, 1972
[8] L.E. Truesdell, The Development of Punch Card Tabulation in the Bureau of the Census, U.S.
     Government Printing Office, 1965, pp. 35-50.
[9] F. Vannozzi. Catalogare il patrimonio scientifico e tecnologico: da sic a sts a pst, storia di un
     percorso (e di una collaborazione). In: Pratesi, G., Vannozzi, F. (eds.) I valori del museo.
     Politiche di indirizzo e strategie di gestione, pp. 98-101.
[10] I. Robinson, J. Webber, E. Eifrem. Graph Databases, 2nd edn. O’Reilly Media, 2015
[11] M. Dewey. A classification and subject index for cataloguing and arranging the books and
     pamphlets of a library. Amherst, Mass., 1876
[12] G.A. Miller. Wordnet: A lexical database for english. Communications of the ACM 38, 39-41,
     1995
[13] T. Heath, C. Bizer. Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures
     on the Semantic Web, Morgan & Claypool Publishers, 2011
[14] S. Ferilli. Integration Strategy and Tool between Formal Ontology and Graph Database
     Technology. Electronics, ISSN 2079-9292, 10:2616, 27 pp., MDPI, 2021
[15] S. Ferilli & D. Redavid. The GraphBRAIN System for Knowledge Graph Management and
     Advanced Fruition. In Foundations of Intelligent Systems, LNAI 12117, 308-317, Springer, 2020
[16] H. Chiba, R. Yamanaka, S. Matsumoto. G2GML: Graph to Graph Mapping Language for
     Bridging RDF and Property Graphs. In The Semantic Web – ISWC 2020, pp. 160–175, Springer,
     2020
[17] https://protegeproject.github.io/owl2lpg (consulted on 14 October 2022).
[18] https://github.com/SciGraph/SciGraph/wiki/Neo4jMapping (consulted on 14 October 2022).
[19] https://github.com/VirtualFlyBrain/neo4j2owl (consulted on 14 October 2022).
[20] https://github.com/cmungall/owlstar (consulted on 14 October 2022).
[21] Hartig, O. Foundations to Query Labeled Property Graphs using SPARQL. In Joint Proceedings
     of the 1st International Workshop on Semantics for Transport and the 1st International Workshop
     on Approaches for Making Data Interoperable, Central Europe (CEUR) Workshop Proceedings
     vol. 2447, CEUR-WS.org, 2019.
[22] “#MINF_POUR UN MUSÉE DE L’INFORMATIQUE ET DU NUMÉRIQUE EN FRANCE”
     report, 2015. https://project.inria.fr/minf/files/2011/12/MINF_Rapport_HD.pdf (consulted on 14
     October 2022)
[23] S. Ferilli, D. Redavid. An Ontology and a Collaborative Knowledge Base for History of
     Computing. In Proceedings of the 1st International Workshop on Open Data and Ontologies for
     Cultural Heritage (ODOCH-2019), Central Europe (CEUR) Workshop Proceedings vol. 2375,
     49-60, 2019
[24] https://dl.acm.org/ccs (consulted on 14 October 2022).
[25] https://www.ieee.org/publications/services/thesaurus-thank-you.html (consulted on 14 October
     2022)