The Shock of the New: Testing the Pan-Archival Linked
 Data Catalogue with Users (short paper)

Alex Green1 and Dr K Faith Lawrence2
1, 2 The National Archives, Kew Surrey, TW9 4DU, United Kingdom


                Abstract
                The UK National Archives’ goal is to re-imagine archival practice,
                pioneer new approaches to description and build a new linked data
                catalogue. The Pan-Archival Catalogue will bring together into one
                management system descriptions of both physical and digital records
                from a variety of sources within the organization. This report briefly
                describes the users’ feedback on aspects of the new data model when
                first shown in the new editorial interface and as part of business pro-
                cesses.

                Keywords
                Archives, Catalogues, User Research, Data Model, Linked Data.

1. Introduction
Archives are changing. New ways of preserving, describing and presenting rec-
ords are emerging and archivists are rising to the challenge. At last year’s con-
ference, our colleagues presented a paper on the development of The National
Archives’ Pan-Archival linked data catalogue and our need to replace the age-
ing system with a new catalogue to manage the metadata for all types of records.
[1]
   We have started our exploration of how the model fits with the needs of users
and their processes. In this brief report, we reflect on the initial user responses
to the implementation of the new model in the first iterations of the editorial
user interface and related editorial and accessioning workflows. This high-
lighted a number of assumptions we had made about the application of linked
data in general, and our data model in particular, to the machinery of the edito-
rial process within the creation and management of The National Archives’ cat-
alogue.

Proceedings TPDL2022: 26th International Conference on Theory and Practice of Digital Librar-
   ies, 20-23 September 2022, Padua, Italy
EMAIL: alex.green@nationalarchives.gov.uk (A. 1); faith.lawrence@nationalarchives.gov.uk (A. 2)
ORCID: 0000-0002-6993-4157 (A. 1); 0000-0001-9200-1921 (A. 2)
             ©️ 2022 Copyright for this paper © Crown copyright (2022). Licensed under the Open Government Licence v3.0
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
2


2. Enter the Users
The first phase of the project produced a draft of our conceptual data model. A
living document, it described our approach to ‘replacing legacy systems, reduc-
ing duplication and creating new opportunities through unlocking the unreal-
ised potential in The National Archives’ data’ [2]. This model embraced new
thinking in archival description (notably the ICA’s Records in Contexts (RiC)
[3]) to enable us to meet our strategic goal of reimagining archival practice for
the 21st century. A significant change introduced by the model is the division
of a record (an intellectual entity) into four entities, each with their own prop-
erties: an unchanging Concept, its associated temporal Description(s), Realisa-
tions (specific physical or digital instance(s)) and individual Digital Files (i.e.
computer files in the case of digital records).
   In parallel, we carried out some initial user research on the existing editorial
interface to give us an understanding of the current processes, issues and new
requirements using an early prototype based on the existing physical records
model. As the new model is a marked departure, and because the catalogue is a
business critical system, a more rigorous approach to our user research was
needed. A team of user experience (UX) researchers and service designers car-
ried out formal user research to ensure that, as well as adhering to accessibility
and editorial standards, we were following UK Government Digital Service
Standard [4] best practice to put user needs at the centre of new product devel-
opment. As we explained the new model to this team and provided them with
sample data, we realised how significant the impact of the new data model
would be on our colleagues' day-to-day work.

3. Respecting the fonds?
The principle of provenance forms the basis of most institutional archival de-
scription. Within the profession however, there has been a growing acknowl-
edgement of the multiplicity of perspectives on archival records beyond the in-
stitutional. The RiC-Conceptual Model (RiC-CM) incorporates this wider per-
spective, encompassing the single hierarchy resulting from the Respect des
fonds approach but taking it further. It defines a record set as both ‘one or more
records that are grouped together by an agent based on the records sharing one
or more attributes or relations’ and ‘some other selection and grouping that ful-
fils a particular purpose or purposes (for example, a classification that reflects
or supports the purposes of a researcher)’ [5]. We decided to adopt RiC’s record
set and its associated definitions but its application remained in question. Our
existing catalogue describes the records and their arrangement when in use in
their creating department therefore the types of record sets necessary to support
the current catalogue data are: Fonds (Department), Division, Series, Subseries,
                                                                                            3


Subsubseries and File. This list of types is expected to expand as born-digital
records are brought into the model with the addition of Item (where child ‘Sub-
items’ exist) and Directory (for a folder containing individual digital files).
However, with the wider definition of a record set, we could support alternative
groupings or arrangements (as well as the principle of original order). This
would be a significant departure from the current process, even discounting
what we have termed ‘catalogue-adjacent record sets’, such as those records
grouped as a result of research or presentation which are beyond the scope of
the project. We can however, foresee cases where records could belong to more
than one record set e.g. a set of records relating to an event created originally
by the Home Office and subsequently sent to an inquiry, could be reflected in
two arrangements according to their differing uses by the two bodies.
   From a staff user’s perspective, how should we present our records ar-
ranged in different ways by the people who used them? How would we show
the contexts of the different arrangements? Due to the urgency of replacing
our ageing system, finding the answers to these questions can be deferred for
the moment.

4. Splitting the Record
The division of a record into four entities (see Section 2) is a fundamental shift
from our current ISAD(G)-based data model1. Each entity has its own proper-
ties, some of which exist at multiple levels, for example, both the Description
and the Realisation have Scope and Content but each contain different infor-
mation. Other properties are unique to a specific level, for example, only Real-
isations have Physical Extent and Form.

    When we considered how to present these different entities and their prop-
erties in the new user interface, it became clear that the data could not be split
easily into these entities and properties using automation. To explore this, we
mapped an existing catalogue description to the new model.2 The current Scope
and Content describes the record as ‘Middlesex: Westminster (now in London
Borough of Westminster). Plan of Buckingham House and grounds abutting on
Green Park and St James's Park. Shows garden layout, with trees in elevation.
Reference table to plots marked EFGHI and KLM on plan. Scale: 1 inch to 60
feet. Compass indicator. [By] Charles Evans. This plan, annotated 'No 150' at
the top of the sheet, is similar to MPE 1/378, but refers to different portions of
the site. A copy of this plan, made in May 1760, is MFQ 1/450’

1 See https://www.ica.org/en/isadg-general-international-standard-archival-description-second-
  edition
2 See the catalogue entry on Discovery: https://discovery.nationalarchives.gov.uk/de-

   tails/r/C4048574
4


Table 1
Record Scope and Content Description Reworked into the New Data Model

    Scope and Content: Middlesex:           Copies Information: A copy of this
    Westminster (now in London Bor-         plan, made in May 1760 is MFQ
    ough of Westminster). Plan of Buck-     1/450.
    ingham House and grounds abutting       Map Scale: 1:720
    on Green Park and St James's Park.      Related Material: This plan is similar
    Shows garden layout, with trees in      to MPE 1/378, but refers to different
    elevation. Reference table to plots     portions of the site.
    marked EFGHI and KLM on plan.           Places: Westminster, Green Park
    Scale: 1 inch to 60 feet. Compass in-   and St James's Park
    dicator.                                Creator: Charles Evans

    Realisation 1 (the physical record      Scope and Content: This record is
    held at TNA):                           hand drawn
    Realisation 2 (a digitised copy held    Scope and Content: This record is a
    by the Image Library at TNA):           digitised copy

   Staff acknowledged that our mapping, augmentation and arrangement of the
description (see Table 1) were valid but were concerned about how this would
be achieved without extensive re-cataloguing. Some of this metadata is not cap-
tured in this structure by the existing accessioning process so it would necessi-
tate changes not just internally at The National Archives but also government
departments who are responsible for describing the records they transfer to us.
Clearly, this approach will need further consultation.

5. From the Specific to the General
Our existing data model is based on the accepted archival principle that ‘ar-
chival description proceeds from the general to the specific’ i.e. data is normal-
ized so that it is held at the highest point possible in the hierarchy [6]. With the
new data model, we are revisiting this principle. Denormalizing the data, i.e.
moving the information from the upper levels down to the level to which it
applies, might be more accurate in some cases. It could also simplify the queries
needed to return relevant records. However, this approach is not without its
issues from both practical and archival perspectives.
   Some properties can safely be moved down to the record level: if there is
only one organisation in Immediate Source of Acquisition and the series is no
longer accruing, this value could be denormalised. Archivists are reluctant to
                                                                                 5


make inaccurate statements and even when it seems simple to denormalize the
data, it is not always sound to do so. For example, where the creator information
is currently held at series rather than at file or item level. If a series only has
one creator then the logical assumption would be that the information could be
propagated down to any records within that series. Conceptually however, in
recognizing that the catalogue data is both a work in progress and exists in an
open world, the assignment of the creator at series level only indicates that at
least some of the records in that series came from that creator rather than being
a statement about all of them. From a functional perspective, this nuance may
not be obvious to a researcher using the public catalogue, so denormalizing
substantiates an assumed, but unknown, association.
   Where there are multiple creators listed at series level (see Fig. 1) denor-
malization is more risky as dates, the most obvious means of disambiguation,
are not granular enough to separate many of the edge-cases, and, as the indi-
vidual records themselves may have more than one creator, there cannot be a
clear and automatically applied delimitation. While these edge cases are only
a small percentage of the total, the number is still significant enough to re-
quire manual checking, and this cannot be achieved during the initial data mi-
gration exercise given the project’s deadline.


      Figure 1: A record set with multiple creators listed at series level

   If we cannot denormalize the existing data, either as a whole or in part, could
we look to the future data to improve the accuracy of the catalogue? Could the
data better reflect reality at the level at which the information is held? One ex-
ample could be holding the creator of a record at the level of that record. The
ideal is to represent the truth but, as with the additional requirements around
6


the capture of more structured metadata described in Section 4, we are not re-
ceiving creator data for individual records now, so these changes would need
to be discussed with both internal staff and those in government departments
responsible for describing and transferring the records. From a user-centric
perspective, the aim of the new system is to streamline the editorial process and
reduce the workload by removing inefficiencies in the interface rather than gen-
erate additional work. Some information could be captured automatically but,
returning to the example of creator, the transferring department and the creating
department are not necessarily the same and the transferring department may
not know the creating department if the records are older, and/or inherited from
elsewhere in government. This leaves us in a position where conceptually it
would be valuable to denormalize the data, and the data model supports us do-
ing so, but it may not be feasible in reality.

6. Conclusion: Challenging our Assumptions
   Our work with the UX team challenged some key assumptions that we had
made in the early stages of the project. While we had shared our data model
publicly and sent it for review by members of our core user group who were
familiar with conceptual models, it was not until we began incorporating parts
of the model in the wireframes for the initial interface that the impact for the
archivists and their working practices became clear.
   The work on the new catalogue system looks both inwards, to improving the
interface for the editorial team, and outwards to the data contributed by other
teams to the system or those supplied with data sent from the system. Moving
to a linked data catalogue offers many advantages when searching, processing
and exploring the data but for the staff managing the data, the benefits are less
clear. While the staff are enthusiastic and engaged, change is never an easy
proposition and a key component of successful change management is showing
the direct benefits to the people affected by the change. We have more questions
than answers, but we have a better idea of what the questions are. It is the users
rather than the technology that should drive the change, especially where it has
implications for the editorial process. As we reach the stage of the project where
technology and users meet, and under the pressure of delivery deadlines, we are
seeing more points of negotiation and re-evaluation emerge. We will continue
to learn as we start to build the user interface iteratively: testing our assumptions
about how staff will work with the new model to ensure it meets their needs
and allows us to make the best use of the model’s potential.
                                                                           7


7. References
 [1]   J. Garmendia, A, Retter, Developing a Pan-Archival Linked Data Cat-
       alogue, in Proceedings of Linked Archives International Workshop
       2021, CEU-WS.org, pp. 93-10393-103. URL: http://ceur-ws.org/Vol-
       3019/LinkedArchives_2021_paper_7.pdf.
 [2]   J. Garmendia, A, Retter, Developing a Pan-Archival Linked Data Cat-
       alogue, in Proceedings of Linked Archives International Workshop
       2021, CEU-WS.org, p. 103. URL: http://ceur-ws.org/Vol-
       3019/LinkedArchives_2021_paper_7.pdf.
 [3]   ICA, Records in Contexts Conceptual Model, Consultation Draft v0.2
       July 2021. URL: https://www.ica.org/sites/default/files/ric-cm-
       02_july2021_0.pdf.
 [4]   Government Digital Service, Service Standard, 2019. URL:
       https://www.gov.uk/service-manual/service-standard.
 [5]   ICA, Records in Contexts Conceptual Model, Consultation Draft v0.2
       July 2021, p. 23. URL: https://www.ica.org/sites/default/files/ric-cm-
       02_july2021_0.pdf.
 [6]   The General International Standard for Archival Description, Septem-
       ber 201, p. 8. URL: https://www.ica.org/en/isadg-general-interna-
       tional-standard-archival-description-second-edition.