<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>From Users to Systems: Identifying and Overcoming Barriers to Efficiently Access Archival Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicola Ferro</string-name>
          <email>nicola.ferro@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianmaria Silvello</string-name>
          <email>gianmaria.silvello@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Padua</institution>
          ,
          <addr-line>Padua</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>22</volume>
      <issue>2016</issue>
      <abstract>
        <p>Digital archives are one of the pillars of our cultural heritage and they are increasingly opening up to end-users by focusing on accessibility of their resources. Moreover, digital archives are complex and distributed systems where interoperability plays a central role and e cient access and exchange of resources is a challenge. In this paper, we investigate user and interoperability requirements in the archival realm and we discuss how next generation archival systems should operate a paradigm shift bringing a new model of access to archival resources which allows to better address these needs. To this end, we employ the data structures and query primitives based on the NEsted SeTs for Object hieRarchies (NESTOR) model to e ciently access archival data overcoming the identi ed barriers and limitations.</p>
      </abstract>
      <kwd-group>
        <kwd>set-based data models</kwd>
        <kwd>archival data</kwd>
        <kwd>XPath</kwd>
        <kwd>XML</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>Archives, along with libraries and museums, are one of
the main cultural institutions encompassed by Digital
Libraries (DL). Archives represent the trace of the activities
of a physical or legal person in the course of their
business which is preserved because of their continued value over
time. They are composed of unique documents interlinked
with each other as well as with their production and
preservation environments. The main characteristic of archives
lies in the hierarchical structure used to retain the context
and the full informational power of archival data.</p>
      <p>The hierarchical structure shaping archives is a
foundational feature of traditional paper-based archival description
{ the so-called nding aid. This is re ected in its digital
counterpart, the Encoded Archival Description (EAD) [14]
eXtensible Markup Language (XML) format, which is the
key brick for managing, nding and accessing archival data.</p>
      <p>
        Over the last decade, thanks to the centrality of the Web
for information access and the rapid evolution of DL
services, we have witnessed a major shift towards a \radical
user orientation" [12] of archives, where usability and
ndability of resources are becoming number one priorities [
        <xref ref-type="bibr" rid="ref5">20</xref>
        ]
given the \dramatic increase" [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in the number of people
accessing them. A recent user study [11] analyzing the user
interaction patterns with nding aids highlighted that \[they]
focus on rules for description rather than on facilitating
access to and use of the materials they list and describe " and
that many archive's users have serious issues using nding
aids [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Common and frequent user interaction patterns
with nding aids are navigational and thus they require to
browse the archival hierarchy to make sense of the archival
data; for instance, two common interaction patterns are [11]:
top-down where users \start at the highest level, gain
background and context, and work down to the most speci c level
of detail " and bottom-up where users \start at the most
detailed level seeking speci c information, and then move back
to the higher levels".
      </p>
      <p>
        From this new point-of-view, digital nding aids (i.e. EAD)
constrain user orientation of archives because several key
operations are not possible nor e cient, given that it is
problematic to: (i) let the user access a speci c item on-the- y,
whereas we have to de ne xed access points to the archival
hierarchy [8]; (ii) let the user reconstruct the context of an
item without requiring to browse the whole archival
hierarchy [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]; and, (iii) present the user with only selected items
from an archive, whereas we have to give them the archive
as a whole [7, 18].
      </p>
      <p>From the technological perspective, the presented
limitations also a ect the interoperability of archives in
distributed environments, thus preventing the exchange of
resources by means of standard DL technologies such as the
Open Archives Initiative Protocol for Metadata
Harvesting (OAI-PMH)1 [8, 15]. Indeed, a single EAD le describes
a whole archive and thus it is not possible to share or
exchange in a distributed environment only a subset of records;
for archives, it is common to be required to exchange only
the high-level descriptions (e.g., fonds and sub-fonds) or to
exchange only the records open to public disclosure. This
problem a ects the possibility to exchange nding aids with
variable granularity by means of OAI-PMH forcing archival
institutions to share whole archives or nothing. EAD
provides archivists with many degrees of freedom in tagging
practice exacerbating the di erences in how XML elements
are used and nested one inside the other [10]. This makes
it di cult to know in advance how an institution will use
the hierarchical elements and then to de ne general rules
and paths to access EAD elements; for instance, there is no
guarantee that an XML Path Language (XPath) expression
returning all the series or the units in a given EAD le will</p>
      <sec id="sec-1-1">
        <title>1http://www.openarchives.org/pmh/</title>
        <p>work with a di erent le in another collection or even in the
same one.</p>
        <p>In this paper, we stem from the above observations about
the user and interoperability needs in the archival realm to
discuss how next generation archival systems should operate
a paradigm shift bringing a new model of access to archival
resources which allows to better address these needs. In
particular, the contribution of the paper is to turn the above
requirements into speci c access use cases to archival
resources, discussing how and why current approaches
represent a barrier to their complete ful llment, and showing
how our proposed solution, called NEsted SeTs for Object
hieRarchies (NESTOR) [8, 9], represents a step forward.</p>
        <p>Indeed, NESTOR [8] de nes an alternative way to
represent hierarchical data by expressing the relationships
between objects through the inclusion property between sets,
in contrast to the binary relation between nodes exploited by
the tree which is the typical model used to represent archival
data. NESTOR has been instantiated by three data
structures on which query primitives, proven to be highly e cient
in a wide spectrum of cases, have been realized [9]. NESTOR
represents a paradigm shift with respect to state-of-the-art
solution to access hierarchical data because it answers query
primitives { e.g., descendants and children to deal with the
top-down interaction pattern and ancestors and parent to
deal with the bottom-up one { by exploiting basic set
operations which do not require to browse and navigate the
hierarchy.</p>
        <p>Moreover, in order to fully understand the di erence
between NESTOR and state-of-the-art navigational (i.e., based
on XPath) approaches, we conducted a case study
evaluation based on ten real-world heterogeneous EAD les
representing di erent key challenges for the identi ed access use
cases, where we discuss the main drawbacks of a
navigationbased access approach and how they are addressed by the
NESTOR set-based one. We also show how the intrinsic
di erences between NESTOR and traditional navigational
approaches are also consistently re ected in the query
execution times, which are a quantitative proxy for appreciating
the paradigm shift represented by NESTOR and its impact.</p>
        <p>The rest of the paper is organized as follows: Section 2
provides relevant background information; Section 3
discusses the examined use cases; Section 4 presents the
experimental outcomes. Finally, Section 5 draws some
conclusions.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. BACKGROUND 2.1</title>
    </sec>
    <sec id="sec-3">
      <title>Digital Archives</title>
      <p>Archives are composed by \unique records of corporate
bodies and the papers of individuals and families " [14]. The
original order { i.e. the principle of provenance { of the
documents within an archive is preserved because the context
and the physical order in which the documents are held are
as valuable as their content [6].</p>
      <p>According to the International Standard for Archival
Description (General) (ISAD(G)), archival description (i.e. the
nding aids) proceeds from general to speci c as a
consequence of the provenance principle and has to show, for
every unit of description, its relationships and links with other
units and to the general fonds, taking the form of a tree
as shown in Figure 1 on the left. The digital encoding of
ISAD(G) is the Encoded Archival Description (EAD) [14],
FONDS</p>
      <p>A</p>
      <p>SUBFONDS B
SUB</p>
      <p>FONDS C
Archival record 10</p>
      <p>UNIT
G
UNIT
H
UNIIT
UNIT</p>
      <p>L
shown in Figure 1 on the right, which is an XML
description of a whole archive, re ects the archival structure, holds
relations between entities and retains context.</p>
      <p>
        EAD follows the traditional archival paradigm where
experts know exactly what they are looking for and, for
example, they browse EAD to know the location of physical
records [12]. By contrast, in the new user-oriented paradigm
enabled by digital archives \users no longer have to be
dependent on the physical presence of archivists to identify,
review, and retrieve materials" [
        <xref ref-type="bibr" rid="ref8">23</xref>
        ], but they need e ective
means for performing information seeking activities. As a
matter of fact, EAD turns out to be problematic in: (i)
supporting user-oriented information access; (ii) supporting
exible control access policies; (iii) enabling
interoperability between digital archives working in distributed
environments.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>XPath: A Navigational Approach</title>
      <p>XPath2 is widely adopted for searching and selecting
portions of EAD les. XPath is a language for addressing parts
of an XML document; it provides basic facilities for
manipulation of several data types and adopts a path notation
for navigating through the hierarchical structure of an XML
document. \Location path" is a common kind of XPath
expression, which selects a set of nodes relative to a given node
and as output returns the node-set containing the nodes
selected by the location path. Each part of an XPath
expression can be composed of three parts: (i) an axis, which
speci es the tree relationship between the nodes; (ii) a node
test, which speci es the node type and expanded-name of
the selected nodes; and (iii) zero or more predicates that
can further re ne the selected set of nodes.</p>
      <p>As it emerges from the previous discussion, archival
systems typically rely on third-party and standard libraries
for XPath processing. Since the NESTOR data structures
and query primitives are implemented in Java and work
inmemory, we are interested in comparing to state-of-the-art</p>
      <sec id="sec-4-1">
        <title>2http://www.w3.org/TR/xpath/</title>
        <p>SERIES</p>
        <p>F
S
U
B</p>
        <p>F
SUB-FONDS B FONDS NO</p>
        <p>A SD</p>
        <p>C SERIES E</p>
        <p>UNIT I</p>
        <p>UNIT
H
UNIT</p>
        <p>G
SERIES D
(a) Euler-Venn representation
of the NS-M
(b) DocBall representation
of the INS-M</p>
        <p>The NESTOR model is de ned by two set-based data
models: The Nested Set Model (NS-M) and the Inverse Set
Data Model (INS-M) [8]; they are formally de ned in the
context of set theory as a collection of subsets. The most
intuitive way to understand how these models work is to
relate them to the archival tree. In Figure 2a we can see how
the archive shown in Figure 1 is mapped into an organization
of nested sets based on the NS-M.</p>
        <p>From Figure 2a we can see that the NS-M adopts a
bottomup approach: (i) each set corresponds to an archival division;
(ii) the innermost sets are the leaves of the hierarchy, e.g.
the units; (iii) you create supersets as you climb up the
hierarchy, e.g. the series, sub-fonds and fonds. The archival
records are represented as elements belonging to the sets.
With the NS-M an archive is modeled as a collection of
subsets where there is a set { i.e. \fonds" { which contains all
the subsets { i.e. \subfonds", \series", \units" { of the archive
and where two subsets at the same level { e.g. two \series"
{ cannot have common elements, thus their intersection is
empty.</p>
        <p>As shown in Figure 2b, the INS-M adopts a top-down
approach: (i) each set corresponds to an archival division; (ii)
the innermost set is the root of the hierarchy, i.e. the fonds;
(iii) you create supersets as you climb down the hierarchy,
e.g. sub-fonds, series and then units. As for the NS-M, also
in this case the archival records are represented as elements
belonging to the sets. With the INS-M an archive is modeled
as a collection of sets where there exists an archival division
shared by all other divisions; in our example, the \fonds" is
the archival division common to all the other divisions in
the archive.</p>
        <p>This vision overcomes EAD limitations because in NESTOR
each archival record is an element belonging to a set which
can be selected and managed independently from the other
records; thus, we can return to the users a list of records
belonging to di erent archival divisions at any level allowing
them to access and consult the records hiding the complexity
of the whole archival structure.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3http://xml.apache.org/xalan-j/ 4http://jaxen.codehaus.org/ 5http://commons.apache.org/proper/commons-jxpath/</title>
        <p>NESTOR can be instantiated by three data structures [9]:
Direct Data Structure (DDS), Inverse Data Structure (IDS)
and Hybrid Data Structure (HDS). Each one of these
structures is composed by three dictionaries, one containing the
materialization of the sets, one containing the direct subsets
of each set and the last one containing all the supersets of
each set. DDS is a structure built around the constraints
de ned by the NS-M, IDS is a structure built around the
constraints of INS-M and HDS can be seen as a mixture
between DDS and IDS [9].</p>
        <p>When we deal with a collection of sets de ned by NESTOR,
we can distinguish between set-wise and element-wise
primitives. The former ones enable us to query the structure of
an archive, whereas the latter ones query the content of the
archive (i.e., the archival records). For instance, by means
of the set-wise primitives we can ask for all the series of a
speci c sub-fonds, whereas with the element-wise primitives
we can ask for all the archival records belonging to the series
of that sub-fonds.</p>
        <p>NESTOR primitives (i.e., Descendants, Ancestors,
Children and Parent) are e cient alternative implementations of
XPath primitives as shown in [9] where we conducted an
extensive evaluation on ve EAD collections, Wikipedia and
two synthetic XML datasets and we compared NESTOR
with state-of-the-art XPath engines. In [9] we evaluated
NESTOR on average performances by testing the primitives
on thousands of les and then presenting mean execution
times; in this paper we investigate how NESTOR primitives
behave with speci c digital archives and how e ciently they
answer to common and frequent archival operations.
3.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>USE CASES</title>
      <p>We present three user-oriented use cases derived from
common interaction patterns individuated in the archival
domain and four interoperability use cases based on the
exchange of archival data in distributed environments.
3.1</p>
    </sec>
    <sec id="sec-6">
      <title>User-oriented Use Cases</title>
      <sec id="sec-6-1">
        <title>Use Case 1: identifying and selecting relevant material</title>
        <p>This use-case is related to the \searching for known material "
information seeking activity investigated by Du and
Johnson in [5]. This activity may be performed by researchers
at the beginning of a project to establish a context and
detect relevant information and it may be re-iterated several
times to \reevaluate information that has suddenly gained
new signi cance" [5]. Such activities can be associated to
the top-down pattern of interaction identi ed by Freund and
Toms in [11] where the users \start at the highest level [of
an archival description], gain background and context, and
work down to the most speci c level of detail ".</p>
        <p>In Figure 3 we can see a graphical representation of this
use case. We consider an archival system that answers a
user query that starting from a given context node requires
to return a list of archival records. From this list the user
then selects the description of, say, sub-fonds C; in this case
two frequent queries to be answered are: to return the
subdivisions (series D, series E, series F, unit G, unit H, unit I
and unit L) which are part of this sub-fonds { i.e a structural
query { and to return all the records (the actual records
or their descriptions contained by the three series and four
units which are children of sub-fonds C) associated to this
sub-fonds { i.e a content query.</p>
        <p>Archivalrecord1 FOSNUDBS-ABrchivalrecSoErdRI1E3SD AArrcchhiivvaallrreeccoorrdd1112</p>
        <p>With a navigational approach based on XPath, the
structural query corresponds to the following XPath expression:
/fondsA/subfondsC/descendant-or-self::*; and the
content query corresponds to:
/fondsA/subfondsC/descendantor-self::*/text(). Both these expressions require to
navigate the archival tree to the sub-fonds C division and then
to visit all of its descendants.</p>
        <p>In Figure 3 we see that the NS-M answers the structural
query by returning all the subsets of sub-fonds C (i.e. all
its descendants), whereas the INS-M answers it by
returning all the supersets of the sub-fonds (i.e. all its ancestors).
The content query is answered by NS-M by returning all the
elements belonging to sub-fonds C, whereas INS-M has to
return the union of all the elements belonging to sub-fonds
C and its supersets. We can see that the NS-M and the
INS-M answer the queries by exploiting two di erent
primitives, the rst is based on the subsets of a set, whereas the
second is based on its supersets. In NS-M the descendants
of an archival node, say sub-fonds C, are the subsets of the
set representing sub-fonds C; whereas, in INS-M the
descendants are the supersets of the given set.</p>
        <p>Use Case 2: building contextual knowledge
\Building context is the sine qua non of historical research " [5]
and one of the main functions of archives. As we described
above, the context of an archival record is required to
disclose its full informational power and thus, reconstructing
the knowledge of a record or of an archival division is one of
the most common and important operation an archival
system has to provide. This operation can be associated with
the bottom-up pattern of interaction identi ed also by [11]
where the users \start at the most detailed level seeking
speci c information, and then move back to the higher levels
to make sense of the information and place it in context if
necessary".</p>
        <p>
          Figure 4 presents the operations required to \build
contextual knowledge" of an archival description. To better guide
the user when exploring the archive the more accurate the
contextual information returned are, the better; indeed, if
we return the whole archive to the user then s/he might be
disoriented by the large amount of heterogeneous
information [
          <xref ref-type="bibr" rid="ref7">22</xref>
          ]. To address this aspect we need to return to the
user all and only the archival divisions from the selected unit
up to the root.
        </p>
        <p>If we consider the case presented in Figure 4 where we
need to reconstruct the context of \Unit L", we can see that
a structural query needs to return all the archival divisions
up to the root { i.e., the ancestors of unit L which are series
F, sub-fonds C and fonds A { and the content query returns
all the records or descriptions contained by these divisions.</p>
        <p>With an XPath-based approach, the structural query (e.g.,
/fondsA/subfondsC/seriesF/unitL/ancestor-or-self::*)
requires to navigate the archival tree from the leaf \unit L"
up to the root; the output of this query is a sub-tree with
the same root of the original tree, but containing only those
nodes on the path between \Fonds A" and the leaf \unit L".
The content query
(/fondsA/subfondsC/seriesF/unitL/ancestoror-self::*/text()) does the same operation but selects
only the data nodes that are then returned to the user.</p>
        <p>As shown in Figure 4, the NS-M answers the query about
the context by exploiting a set-wise primitive which returns
all the supersets of the selected division, whereas the INS-M
does so by returning all its subsets. This operation also has
an element-wise counterpart answering the content query
and in this case, NS-M returns all the elements belonging to
the union of the supersets of the selected unit, whereas the
INS-M simply returns the elements belonging to the set of
the unit.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Use Case 3: seeking unknown archival material</title>
        <p>This use-case is related to the \becoming oriented to a new
archive or collection" information seeking activities
investigated in [5]. It analyses a common scenario where users
have not a clear idea about what they are looking for and
may proceed systematically from an archival division to the
other. This use case is also related to the two previous ones
because, among other operations, it may require to analyze
the descendants of a given archival division or record as well
as to climb up the hierarchy. Indeed, we can see this use
case as a combination of the top-down and the bottom-up
patterns and can be associated to the \systematic
interrogation" interaction [11], where the users \develop hypotheses
SERIES E</p>
        <p>SERIES F
as to where in the nding aids structure the information is
most likely to be and check each one in turn ".</p>
        <p>In Figure 5 we show this use case where the user selects an
archival division or a record and then asks for all the archival
divisions (structural or set-wise) or all the records (content
or element-wise) at the same level of the selected element
(e.g. the siblings of this element). For instance, if the user
selects one of record descriptions represented by \Unit L" in
the gure, this operation allows her/him to retrieve all the
other descriptions connected to it (e.g. all the sibling units
of \Unit L" or the elements belonging to them).</p>
        <p>We can see that to answer this interrogation, both from
the structural and the content viewpoints, the navigational
approach requires two XPath expressions where the rst one
returns the parent node of the given node and the second,
starting from this last one node, returns all of its children;
note that to do this, navigational approaches need to visit
each child node and thus the higher the number of children,
the higher the complexity of this operation.</p>
        <p>The NS-M answers the query with a set-wise primitive by
returning all the direct subsets (i.e. the children) of the
superset (i.e. the parent) to which the selected unit belongs;
as usual, the INS-M reverses this logic and answers by
returning all the direct supersets of the subset to which the
selected unit belongs. The element-wise primitive takes the
sets outputted by the set-wise one and then returns all the
elements belonging to them.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>3.2 Interoperability-oriented Use Cases</title>
      <p>As described above and reported in [15], digital nding
aids based encoded by the EAD standard represent a
barrier towards the very interoperability this standard aims to
enable. Indeed, as we see below, with EAD there are
several OAI-PMH functions which cannot be used by archival
systems. On the other hand, NESTOR set-based operations
can be straightforwardly employed by archival systems to
use all OAI-PMH functionalities with digital nding aids [8].</p>
      <sec id="sec-7-1">
        <title>Use Case 4: Get Records</title>
        <p>This use case is based on the a common OAI-PMH request
where a service provider requests all the records belonging
to an archive. This use case can be addressed also by
navigational approaches just by exchanging the whole EAD le
via OAI-PMH.</p>
        <p>NESTOR addresses this case by relying on the descendant
content operation shown in Figure 3 with a slight variation;
indeed in the gure we ask for all the descendants of
subfonds C, whereas in this case we are asking the NS-M to
return the set representing \Fonds A" which contains all the
records in the archive, and the INS-M to return the union of
all records belonging to the set \Fonds A" and its supersets.</p>
      </sec>
      <sec id="sec-7-2">
        <title>Use Case 5: Get Sub-hierarchy</title>
        <p>This use case is a speci cation of the previous one where the
service provider requests only those records belonging to the
sub-hierarchy rooted in a given archival division.
Navigational approaches do not apply to this case, whereas NESTOR
can address it by means of the descendant content operation
as shown in Figure 3.</p>
      </sec>
      <sec id="sec-7-3">
        <title>Use Case 6: Get Context</title>
        <p>In this case the service provider requests all the records
belonging to a speci c division, say \Unit L", and to all the
related divisions up to the root as shown in Figure 4.</p>
        <p>As in the previous case, navigational approaches do not
apply to this case, whereas NESTOR addresses it by
employing the ancestor content primitive which for the NS-M
returns the union of all the records belonging to \Unit L"
and its supersets and for the INS-M returns all the elements
belonging to the \Unit L".</p>
      </sec>
      <sec id="sec-7-4">
        <title>Use Case 7: List Sets</title>
        <p>This use case is related to the \listSets" OAI-PMH verb \used
to retrieve the set structure of a repository" and allows the
service provider to know the structure of a local repository
in advance.</p>
        <p>This request cannot be answered by an XPath expression
because it is not possible to extract only structural
information ltering out all data nodes; moreover, the OAI-PMH
set-based organization of metadata does not apply to EAD.
On the other hand, answering the \listSets" verb is natural
for NESTOR because it retains the structure by exploiting
inclusion relationships between sets. Therefore, it answers
this request by employing the descendant structure
operation as shown in Figure 3.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>VALIDATION</title>
      <p>We proposed three di erent instantiations of NESTOR
according to three alternative data structures, namely DDS,
IDS and HDS. In order to compare the query operations
de ned on these data structures with currently adopted
solutions for operating on digital archives we selected two EAD
collections that provide us with real-world archival data: the
National Archives of the Netherlands6 and the Library of
Congress nding aids.</p>
      <p>We selected ten EAD les taken from these collections
representing a wide variety of archives with di erent
characteristics representing key challenges for archival systems.
The statistics about these les are reported in Table 1.</p>
      <sec id="sec-8-1">
        <title>6http://www.nationaalarchief.nl/</title>
        <p>EAD-01
EAD-02
EAD-03
EAD-04
EAD-05
EAD-06
EAD-07
EAD-08
EAD-09
EAD-10</p>
        <p>DDS, IDS and HDS are compared to widely-adopted ready
to use solutions based on the XPath for operating of the
structure and the content of EAD les: Xalan, Jaxen and
JXPath, which represent the state-of-the-art solutions for
dealing with EAD les7.</p>
        <p>The main characteristic of EAD les representing a
challenge for XPath libraries is the number of nodes in each
le; the selected les are of increasing sizes to show that
navigational-based solution performances depend by the
number of nodes and the overall dimension of the EAD les,
whereas this does not apply for the set-based operations
implemented by NESTOR. Indeed, in Figure 6 we can see that
all the XPath libraries answer in linear time with respect
to the size of the EAD le because they need to navigate
big hierarchies by visiting a great number of nodes. On the
other hand, we can see that IDS answers the descendant
structural operation in constant time for all the EAD les
and it is ve orders of magnitude faster than XPath-based
solutions. DDS and HDS show some dependence on the size
of the EAD le; indeed, they need to perform some set
operations (more nodes mean more operations) that require
some time, even though for the descendant content
operation, they are several orders of magnitude more e cient
than navigating the archival hierarchy. Overall, IDS is the
best solution for addressing use case 1 and 7, whereas DDS
is the best for use cases 1, 4 and 5.</p>
        <p>It is interesting to note that for addressing use cases 1, 4
and 5, XPath-based libraries are slower for the EAD-04 le
which is the one with the highest number of children (i.e.,
10,271) followed by EAD-09 which also has a high number
of children (i.e., 8,930). These two les are challenging for
all the use cases requiring the descendants or the children
of a node such as use cases 1, 3 and 5. Navigational-based
solutions are particularly challenged by this case as we can
see in Figure 6 for the content operation and in Figure 8. On
the other hand, we can see that the IDS and the HDS are
not a ected by the high max fan-out of these les given that
they can answer without visiting the high number of child
nodes, but just by returning a set or by performing basic set
operations. DDS requires more set operations than the other
two set-based solutions; even though in most cases it is
consistently more e cient than navigation-based solutions, it is
still less performing than IDS and HDS which are extremely
e cient for these cases. The overall performances reported
in Figure 8 with a particular focus on EAD-04 and EAD-09
show that set-based solutions are particularly well-suited to
address the operation employed by use case 3.
7We ensure a fair comparison because all the tested solutions
are implemented in Java, work in central memory and are
tested on the same machine.</p>
        <p>Lastly, use case 2 requires to climb up the archival
hierarchy from a given entry point. We considered EAD les with
variable depth (from 9 to 17) and we validated the ancestor
operations using the deepest node in each hierarchy as
entry point which represents the worst case scenario for any
archival system. From a performance viewpoint, in Figure 7
we can appreciate the di erence between the NESTOR
setbased approaches and the XPath navigational approaches.
Indeed, NESTOR-based solutions behave consistently for all
the tested EAD les and do not depend by the depth and
size of EAD les. On the other hand, the XPath libraries
behave di erently from le to le showing a dependence on
the number of nodes, fan-out and depth of the les; for
instance, JXPath behaves less e ciently when EAD les have
a high max fan-out (EAD-04 and EAD-09), whereas Xalan
performances worsen as the number of nodes increases.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>CONCLUSIONS</title>
      <p>In this paper we identi ed and described the barriers
preventing an e cient access to archival data. We described
the main drawbacks of EAD and we showed how it impairs
a smooth and e cient access to archival descriptions as well
as that it does not satisfy several interoperability
requirements.</p>
      <p>We analyzed the role of the NESTOR model in the context
of digital archives and described its main advantages with
respect to state-of-the-art navigational-based solutions. We
have seen that NESTOR set-based approach represents a
paradigm shift in the access of XML les which is well-suited
to enable interaction and interoperability functionalities in
the archival context.</p>
      <p>We identi ed and described seven use cases highlighting
the key challenges archival systems have to address in
order to deal with common user interaction patterns and to
satisfy interoperability requirements. In this frame of
reference, we compared and discussed strengths and limitations
of navigational-based solutions with respect to NESTOR
set-based ones.</p>
      <p>We have seen that NESTOR is a model of access to archival
resources that allows us to better address the identi ed needs
both from the user and the interoperability viewpoints. From
a quantitative standpoint, the experimental validation
conrms that NESTOR-based solutions consistently outperform
state-of-the-art solutions; moreover, we have seen that NESTOR- [19] W. Scheir. First Entry: Report on a Qualitative Exploratory
based solutions are less dependent { or not dependent at all Study of Novice User Experience with Online Finding Aids.
{ on the hierarchical structure of archives than navigational- J. of Arch. Org., 3(4):49{85, 2006.
based ones.
Parent Structural Operation</p>
      <p>Parent Content Operation
Ancestor Structural Operation</p>
      <p>EAD05 EAD06</p>
      <p>EAD files</p>
      <p>Use-cases 2 and 6</p>
      <p>Ancestor Content Operation
10−5
EAD01
10−5
EAD01
105
104
105
104
105
104
e103</p>
      <p>DDS
IDS
HDS
Xalan
Jaxen
JXpath
DDS
IDS
HDS
Xalan
Jaxen
JXpath
105
104
e103
105
104
e103
l
a
10−5
EAD01
10−5
EAD01
XPath: DDS</p>
      <p>IDS
HDS
Xalan
Jaxen
JXpath</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Chapman</surname>
          </string-name>
          .
          <article-title>Observing Users: An Empirical Analysis of User Interaction with Online Finding Aids</article-title>
          .
          <source>J. of Arch. Org.</source>
          ,
          <volume>8</volume>
          (
          <issue>1</issue>
          ):4{
          <fpage>30</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Daines</surname>
          </string-name>
          and
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Nimer</surname>
          </string-name>
          .
          <article-title>Re-Imagining Archival Display: Creating User-Friendly Finding Aids</article-title>
          .
          <source>J. of Arch. Org.</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ):4{
          <fpage>31</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Daniels</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Yakel</surname>
          </string-name>
          . Seek and You May Find:
          <article-title>Successful Search in Online Finding Aid Systems</article-title>
          . American Archivist,
          <volume>73</volume>
          :
          <fpage>535</fpage>
          {
          <fpage>468</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Discovery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaw</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Reynolds</surname>
          </string-name>
          .
          <article-title>Creating the Next Generation of Archival Finding Aids. D-Lib Mag</article-title>
          .,
          <volume>13</volume>
          (
          <issue>5</issue>
          /6),
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sexton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Turner</surname>
          </string-name>
          , G. Yeo, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Hockey</surname>
          </string-name>
          .
          <article-title>Understanding users: a prerequisite for developing new technologies</article-title>
          .
          <source>Journal of the Society of Archivists</source>
          ,
          <volume>25</volume>
          (
          <issue>1</issue>
          ):
          <volume>33</volume>
          {
          <fpage>49</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Shreeves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. G.</given-names>
            <surname>Habing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hagedorn</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Young</surname>
          </string-name>
          .
          <article-title>Current Developments and Future Trends for the OAI Protocol for Metadata Harvesting</article-title>
          .
          <source>Library Trends</source>
          ,
          <volume>53</volume>
          (
          <issue>4</issue>
          ):
          <volume>576</volume>
          {
          <fpage>589</fpage>
          ,
          <string-name>
            <surname>Spring</surname>
          </string-name>
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yako</surname>
          </string-name>
          . It's Complicated: Barriers to EAD Implementation. American Archivist,
          <volume>71</volume>
          (
          <issue>2</issue>
          ):
          <volume>456</volume>
          {
          <fpage>475</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [23]
          <string-name>
            <surname>J. Zhang.</surname>
          </string-name>
          <article-title>Archival Representation in the Digital Age</article-title>
          .
          <source>J. of Arch</source>
          . Org.,
          <volume>10</volume>
          (
          <issue>1</issue>
          ):
          <volume>45</volume>
          {
          <fpage>68</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <article-title>Examining Search Functions of EAD Finding Aids Web Sites</article-title>
          .
          <source>J. of Arch. Org.</source>
          ,
          <volume>4</volume>
          (
          <issue>3</issue>
          /4):
          <volume>99</volume>
          {
          <fpage>118</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>