From ZigZag™ to BigBag: Seeing the wood and the trees in online archive finding aids Ian G. Anderson University of Glasgow 11 University Gardens Glasgow G12 8QH +44 (0)141 330 3843 I.Anderson@hatii.arts.gla.ac.uk ABSTRACT This paper reports on a one year speculative research project that Whilst an archive's physical space, catalogue arrangement and sought to test the technical feasibility, practical implications and archivist’s assistance all help to guide users' navigation in the usability of transforming an XML Encoded Archival Description analogue world, this paradigm does not easily translate to the (EAD) finding aid into an XML ZigZag™ structure and applying electronic. Nor has there been a significant body of research a relational browser interface. established on archive user's information seeking behaviour. Indeed there is little evidence that traditional archival Categories and Subject Descriptors arrangement adequately served the needs of users in the analogue H.5.4 [Hypetext/Hypermedia]: Navigation; User Issues world. It is unlikely, therefore, that replicating such arrangements in the digital world would prove any more successful. General Terms Where research on archive user needs has been undertaken a Design, Experimentation, Human Factors. range of characteristics have been discovered that suggest a more flexible approach to archival access is required. The very earliest studies in the late 1990s indicated that time, training and access to Keywords information about information were crucial barriers to electronic EAD, ZigZag™, Ted Nelson, XML, Browsing, Visualisation. access, even though this access had become a critical component of historians’ research methods [2]. Later studies have revealed 1. INTRODUCTION the plurality of historians’ information seeking behavior but also On the whole the archive profession is a conservative and the need for both research and archival context that was common traditional one. Since its inception the principles of provenance, amongst the most popular methods [3] and the importance of or Respect des Fond, and adherence to original order have been intermediaries in the use of online material [4]. Academic dominant characteristics in most archive communities. As a result historians require multiple pathways to access primary research the practice of describing archive collections in hierarchical materials and the need for user education on electronic searches arrangements is firmly embedded. Compared to other information suggests that current provision hinders access [5]. Moreover, the services, however, standardisation, both in terms of descriptive need for orientation in even the most experienced user has been standards and arrangement have been relatively late emphasized [6]. developments, as has the provision of online finding aids. Archive portal sites such as the Archives Hub, A2A, AIM25, However, as more archival finding aids, of increasing complexity, ANW and SCAN are evidence of the desire to search across become available online the difficulty of seeing the 'wood from collections and repositories but typical means of browsing or the trees' increases. This is particularly the case when these are displaying search results, such as lists and directories, severely implemented in Encoded Archival Description (EAD) [1]. EAD is restrict users’ ability to see where they are, how they got there an XML DTD for the creation of machine readable, cross and where they can go next [7, 8, 9, 10, 11]. Providing linked searchable archival finding aids and its creators consciously based ‘cross-walks’ such as subject keywords, functional descriptions, its structure on hierarchical analogue finding aids. Whilst this person, place and corporate names can only go so far in provided an important comfort zone for archivists to migrate to addressing this problem. Points at which these cross-walks encoded finding aids, it is also meant EAD inherited the innate intersect can not easily be displayed and users wishing to move difficulty of navigating hierarchical structures. from one to another need to repeat searches or navigate up and down the hierarchy. This problem increases exponentially where related material is Permission to make digital or hard copies of all or part of this work for held in different series, collections or repositories. In these personal or classroom use is granted without fee provided that copies are circumstances trying to follow a particular person, function or not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy responsibility is extremely difficult. In following one path, users otherwise, or republish, to post on servers or to redistribute to lists, lose sight of others, where they cross and what their relationships requires prior specific permission and/or a fee. are. In essence the multidimensional relationships that exist within the finding aid are subordinate to its hierarchical structure. 12 The threads of this research all combine to suggest that a means and activity 'cross walks' within it. Both finding aids cross by which archive users can quickly and intuitively orientate multiple collections and repositories. themselves within collections and identify the relationships and Overall, the project aimed to achieve 'proof of concept' status - context of the resources they are viewing would be immensely that it was technically feasible to map between EAD and a beneficial. This project, funded by the UK's Arts and Humanities ZigZag™ structure; that the transformation between the two could Research Council (AHRC) through a one year speculative be automated; that a web based interface could represent the research grant, sought to test a novel approach to structuring and multidimensions and; that it supported more intuitive browsing visualising archival information by applying a relational browsing for users. interface to EAD finding aids that have been transformed into a multidimensional structure. 3. DEVELOPMENT Several working examples of ZigZag™ structures have already 2. A MULTIDIMENSIONAL SOLUTION been created in other projects using Perl, C, Python and Java to One potential solution to this problem is to structure and visualise run on Windows, Linux and Mac. Initially the most promising of this information multidimensionally. For example, repository, these for this project was the combination of XML, XSL and collection, date and function could each be a separate dimension, JavaScript, successfully demonstrated by Les Carr at the IAM rather like lines on a London Underground map. Therefore, a user Research Group, University of Southampton on a map of the viewing the person name dimension (or line) would see each London Underground, see Figure 1 below [15]. individual represented in a finding aid as a cell. This person may appear in different parts of a collection, separate collections at the same repository and at other repositories, quite possibility related to different organisations, functions or roles. Whilst well developed finding aids can make these links, it is very difficult for users to see and navigate them. One such means of organising information multidimensionally is the ZigZag™ concept developed by Ted Nelson [12]. In other words, a piece of information can exist in different places at the same time and have many connections to other information that may also exist in more than one place. The beauty of the ZigZag™ system is that the user can bring multiple instances of the same information into one view and by changing the dimensions can instantaneously see how the related bits of information are connected. Thus the user is always presented with a locally relevant view of the information, irrespective of how complex the structure is, and without losing the ability to navigate and view all the interconnections. The possibility to represent archival information in this way may provide both functionality and usability that reflects the deep interlinked structures of today's Figure 1. ZigZag™ for Web Browsers London Underground online finding aids. These additional dimensions could be used to Demo provide a whole range of context specific information, such as Taking this project as an inspiration and re-using the ZigZag™ related bibliographies, digital surrogates, user comments and help for Web Browsers XML dialect kindly provided by Les Carr the files. This would allow online finding aids to move from an project's functional and technical specification was defined using access tool to an expert system. the Unified Modeling Language (UML), in particular use-case The advent of XML encoded finding aids, particularly EAD, and and activity diagrams. During this process it was decided that the wide scale implementation of descriptive standards made this static, rather than 'on the fly', transformation of the finding aids an ideal time to test the viability of a ZigZag™ structure and were most appropriate in this context, given that the source data visualisation. was static and the additional computational demands that online dynamic transformations would entail. The number and extent of dimensions it is possible to represent, does of course, depend upon the quality and extent of the Mapping the EAD finding aids to the ZigZag™ structure through underlying data. For this project two finding aids, Gateway to XSL was the first major project milestone. The finding aids Archives of Scottish Higher Education (GASHE) and actually comprise hundreds of files, in the case of GASHE, an Navigational Aids for the History of Science and Technology XML EAD file using the ISAD(G)2 descriptive standard, XML (NAHSTE), provided by the University of Glasgow Archive ISAAR authority file, XML FANDA file (for functional and Services were selected [13, 14]. These finding aids provided the activity descriptions) for each individual collection, in each project with the opportunity to test the concept against EAD, the repository. The relationship between these files needed to be descriptive standards General International Standard Archival understood before the EAD file could be mapped to the ZigZag™ Description (ISAD(G)2) and International Standard Archival structure. At this point the application was named BigBag, partly Authority Record for Corporate Bodies, Persons, and Families as it was a nice alliteration of ZigZag™ but also because the (ISAAR(CPF)). The GASHE finding aid also including function dimension lines between cells in the mapping diagrams resembled 13 the large string shopping bags used to carry groceries, see Figure The cells of the output tree could be one of three types: collection 2 below. (this included collection, fonds, class and record group descriptions), series (this included series, subfonds, subgroups or subseries descriptions) or object (this included item or file descriptions). Fragments from the XML ZigZag™ structure are provided in Figure 3 below. http://www.gashe.ac.uk:443/cgi- bin/view_isad.pl?id=GB-1847- GP&view=basic Records of Glasgow Polytechnic formerly Glasgow Collegeformerly Glasgow College of Technology fonds Figure 2. Archive Structure to ZigZag™ Structure Mapping Thirteen EAD features were mapped to the ZigZag™ output tree each of the many ways to categorise an archival component became a dimension: subject, repository, personal name, location corporate name, a daisy chain crossover linking all archive components, a crossover linking collection and series, and a Figure 3. ZigZag XML Code crossover linking series and objects. As the underlying structure is one of linked circular lists the same cells may appear The transformation from EAD to ZigZag™ used Microsoft's simultaneously in different orders in several linked lists. Command Line Transformation Utility (MSXSL). This process was a two step transformation of archival finding aid data, from The stylesheet had to manage some peculiarities of EAD and EAD XML into ZigZag™ XML and then into ZigZag™ HTML. handle function, subject and place dimensions differently from Les Carr's ZigZag™ for Web Browsers is limited to 40 cells so a other elements as multiple elements were possible, nested within a test file was selected that outputted 27 cells.

tag. For example the stylesheet had to avoid adding the current place if it was the same as a place that had already been However, initial tests of a sample of data from the GASHE added, unless the current place was a sibling of the place that had finding aid using Les Carr's XML dialect and JavaScript interface already been added. proved problematic. The transformation produced a functionally correct interface, but one that had limited usability, comprising The stylesheet expected six required EAD elements, sixteen hundreds of small black arrows dispersed across several screen optional EAD elements and six optional multiple and recursive widths, see Figure 4 below. Furthermore, even with the small EAD elements. Seven escaped character codes were also stripped sample data set, well specified PCs (dual core Pentium from the input tree as well as 13 characters that were illegal in processors, 2GB RAM and 256MB dedicated graphics memory) JavaScript. 14 were returning warnings that the JavaScript was causing the The second version of the interface, and the first to be tested with computer to run slowly. Although the number of cells was small, users, added a colour keyed sliding selector for the various the number of dimensions associated with each cell in GASHE dimensions as well as drop down menus for selecting instances of was far greater than in the original London Underground demo. dimensions and archive components. A breakout box that linked These factors suggested that the JavaScript development path was to the original finding aid for each selected cell was also added as unlikely to scale well enough for the amount of data and number well as history and home buttons, see Figure 5 below [18]. of relationships required or provide sufficient complexity for the visualisation. Figure 5. BigBag Flash Demo Version 2 A small, targeted sample of six people, two archivists, two historians and two students were selected to test this first version. Although the feedback was positive on the whole, with participants finding the interface clear, intuitive and supporting their browsing behaviour it was also evident that the multidimensionality of the underlying ZigZag™ structure was not being adequately expressed. Stefaner’s relational browser only had to express one type of ‘part of’ relationship between two cells at a time and employed a single line to do so. However, with the Figure 4. Section of BigBag JavaScript Demo finding aid ZigZag™ data there are potentially many different Whilst the appearance of the interface and the efficiency of the relationships between each cell that a single line cannot data handling could undoubtedly have been improved a decision adequately convey. The sliding dimension selector was an attempt was taken to seek an alternative means of visualisation. Initially to overcome this problem but users did not like having to scroll an SVG interface was an attractive solution. It would keep the through each dimension on the slider to see if it applied to their data within the XML family and the Parip Explorer project had selected cells. It was evident that a means of immediately successfully demonstrated a visualisation style that could suit the representing the number and type of relationships between cells data [16]. However, the lack of project experience with SVG and was needed. the limitations of browser support led the project, after further The next version of the interface, version three, tested the research, to develop its interface using Macromedia/Adobe Flash technical possibility of having multiple lines, each representing a based on an original idea by Moritz Stefaner [17]. Stefaner's different dimension, connect each cell and for the width of these relational browser for the CIA World Fact Book provided the lines to reflect the number of instances within that relationship. underlying physics for an interface that positioned the selected Once the project had established that this was technically and 'cell' in the centre of the screen with lines spanning out to related aesthetically possible version four of the interface was released. cells of information. Selecting an outlying cell brought this to the This removed the sliding dimension selector and replaced it with a centre of the screen and redrew the relationships. In other words it simple key to the coloured lines. The format of the breakout box provided users with locally relevant view of their selected to link to the original finding aid was simplified and the screen information without losing sight of the immediately bigger split to show the original finding aid to the right [19]. In this picture. An initial trial with a simple greyscale version of the version of the visualisation the colour of the line again indicates relational browser interface demonstrated that it was capable of the dimension type with line widths indicating the number of being modified to reflect, in part at least, the underlying ZigZag™ instances for each type, the thicker the line, the greater the structure. number of instances. Icons indicated whether the cell data existed 15 at the collection, series or object level in the original finding aid, accommodate our visualization. One useful spin-off from this, see Figure 6 below. however, was the development of a set of EAD templates for the NoteTab text editor that placed greater constraints on coding choices. The objective of this exercise was that archivists might adopt them when creating new EAD finding aids and so avoid many of the common problems found in the EAD finding aids that hindered this project. Although the project was able demonstrate the technical viability of an XML ZigZag™ for web browsers on larger and more complex data than the London Underground demo, this was not significantly so and time did not allow for the transformation and visualisation of the entire GASHE finding aid let alone test the stylesheet against NAHSTE. Throughout the project a difficult balance had to be struck between refining and testing the stylesheet against larger and more varied sets of source data and developing a meaningful visualisation to test with users. In the end neither component was as fully developed as it could have been, but the project would have failed in an important respect if it had successfully transformed a large amount of data without any means of displaying the results. In retrospect the project may simply have Figure 6. BigBag Flash Demo Version 4 been too ambitious in its scope. The same six users who previously evaluated the project were After a few false starts the project did create a visualisation that shown the final version of the interface. In this case the reflected the underlying multidimensionality of the ZigZag™ underlying relationships within the data were agreed to be more data, albeit imperfectly. Although the fourth and final interface is explicit and enhanced the browsing of the finding aid data. the closest conceptually to the goals the project set itself its However, users now instinctively wanted to click on the limited development time, even compared to the second version, connecting lines to isolate a particular dimension, a functionality proved a hindrance to establishing with certainty that this that was not possible, rather than use the dimension instance drop provided a significantly more beneficial interface to online down menu. Furthermore, bugs and inconsistencies, particularly archive users. It was never the projects intention to undertake in the way dimension instances were selected significantly extensive user evaluation or usability testing but within the hampered users. Selecting a subject dimension form the drop constraints of what was possible the generally positive feedback is down menu did not alter the cell display, an error that was not sufficient to suggest that the approach adopted does bring benefits present in the previous version of the interface, and an additional for browsing archive finding aids online. How great those benefits erroneous subject instance also appeared on the menu. are, for what type of information seeking behaviour and in what circumstances are questions that this project is unable to answer. By this late stage in the project time was a major constraint and it was not possible to either complete the scheduled interface testing or implement the zoom in and out function, or the add and 5. CONCLUSION subtract cells feature. Indeed it was a struggle to get the final Perhaps inevitably for speculative research this project ultimately version of the visualisation working in time at all. raises more questions than it answers, but has at least demonstrated sufficient merit to warrant those questions being 4. STRENGTHS & WEAKNESSES investigated further. In particular the relative importance of the In part the project has fulfilled its main objectives. It established underlying EAD finding aid, ZigZag™ structure and visualisation that it was conceptually possible to map from EAD to ZigZag™ on the end user's understanding of the data needs to be examined. and that a stylesheet could be developed that automated this process. However, it was not possible to establish that this It is the intention to continue this research by creating a set of transformation could be undertaken on all instances of EAD alternative structures and visualizations based on the same finding aids. Even working within the GASHE collection, underlying archive data – a relational visualisation directly on an variations in EAD encoding practices posed a challenge to EAD fining aid; archive data that has been directly inputted to a efficient transformations. In part this is an inherent weakness of ZigZag™ structure rather transformed; an EAD to ZigZag™ EAD in that its minimal compliance requirement amounts to little transformed visualisation (essentially an updated version of the more than a collection description, akin to a minimally compliant current visualization) and; the archive data as displayed in its TEI header. In the projects test data the lack of entity declarations native state. for special characters also interfered with attempts to create suitable visualisations. In retrospect, editing the GASHE EAD These alternative representations will provide a test bed through prior to transformation would have created a far more efficient which end users understanding of the archive data will be process. However, in trying to create a transformation that would examined using reception theory. Reception theory, sometimes be applicable to real life situations it would be unrealistic to called audience response theory, is a version of reader response expect archivists to amend their EAD files in order to 16 theory that first developed in literary studies and was Computing 1, June 1998. Available at: subsequently extended to include performance works. Reception http://mcel.pacificu.edu/jahc/1998/issue1/articles/andersen/ theory proposes that a text does not have an inherent meaning, but [3] Anderson, I. “Are you being served? Historians and the meaning is created within the relationship between the text and Search for Primary Sources”, Archivaria, 58, Fall 2004. the reader, shaped by the reader’s background, influences and biases. By applying this theory to archival data it is hoped to [4] Duff, W. Craig, B. Cherry, J. “Historians Use of Archival explore the extent to which meaning is created by the user, is Sources: Promises and Pitfalls of the Digital Age,” The Public inherent to some extent in the data itself, and/or meaning is Historian 26, no 2, Spring 2004. shaped by the way in which the data is structured or visualized. [5] Tibbo, H. “Primarily History: How US Historians Search for Primary Sources at the Dawn of the Digital Age,” American There is also the potential for the approach tested here to be Archivist 66, no 1. Spring/Summer 2003. applied to information domains other than archives. Since this [6] Duff, W. and Johnson, C. “Accidentally Found on Purpose: research was completed a brief market analysis was conducted to Information Seeking Behaviour of Historians in Archives,” try and identify other areas that might benefit from this approach. Library Quarterly 72, no. 4, 2002. Although this survey was by no means comprehensive, and there are a range of commercial data visualization products already [7] Archives Hub http://www.archiveshub.ac.uk/ available, the areas of social networks, personal or business [8] Access to Archives (A2A) contact lists, customer relationship management and enterprise http://www.nationalarchives.gov.uk/a2a/ relationship management are potentially new areas that future research could address. [9] AIM25 http://www.aim25.ac.uk/ [10] Archives Network Wales (ANW) 6. ACKNOWLEDGMENTS http://www.archivesnetworkwales.info/ My thanks to the AHRC for funding, the Humanities Advanced [11] Scottish archives Network http://www.scan.org.uk/ Technology and Information Institute and the Faculty of Arts at the University of Glasgow for additional support, Lesley [12] ZigZag http://www.xanadu.com/zigzag/ Richmond and Victoria Peters of University of Glasgow Archive [13] GASHE http://www.gashe.ac.uk/ Services for the supply of data and advice, Les Carr at the University of Southampton for permission to re-use his XML [14] NAHSTE http://www.nahste.ac.uk/ ZigZag™ for Web Browsers dialect, Moritz Stefaner for [15] Les Carr’s Zigzag for Web Browsers London Underground permission to re-use the underlying design of his relational demo http://users.ecs.soton.ac.uk/lac/zigzag/ browser and Steve North, Research Assistant, for his creativity, [16] Parip Explorer Project http://parip.ilrt.org/ dedication and hard work. [17] Moritz Stefaner’s Web Site http://der-mo.net/ 7. REFERENCES [18] BigBag Interfacev2 [1] EAD http://www.loc.gov/ead/ http://www.hatii.arts.gla.ac.uk/research/visual/demov2/index.html [2] Andersen, D.L. “Academic Historians, Electronic [19] BigBag Interfacev4 Information Access Technologies, and the World Wide Web: A http://www.hatii.arts.gla.ac.uk/research/visual/demov4/index.html Longitudinal Study of Factors Affecting Use and Barriers to that Use”, The Journal of the American Association for History and 17