<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Classification for collections mapping and query expansion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Claudio Gnoliˡ</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laura Pusterlaˡ</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Bendiscioliˡ</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristina Recinellaˡ</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Science Library Collections in Pavia</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Science and Technology Library, University of Pavia</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Dewey Decimal Classification has been used to organize materials owned by the three scientific libraries at the University of Pavia, and to allow integrated browsing in their union catalogue through SciGator, a home built web-based user interface. Classification acts as a bridge between collections located in different places and shelved according to different local schemes. Furthermore, cross-discipline relationships recorded in the system allow for expanded queries that increase recall. Advantages and possible improvements of such a system are discussed.</p>
      </abstract>
      <kwd-group>
        <kwd>browsing</kwd>
        <kwd>Dewey Decimal Classification</kwd>
        <kwd>knowledge organization</kwd>
        <kwd>mapping</kwd>
        <kwd>OPAC</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Scientific libraries at the University of Pavia were quite scattered until 2009, when they
have been reorganized into 8 libraries, including the Science Library (Biblioteca delle
Scienze, BDS), the Science and Technology Library (Biblioteca della Scienza e della
Tecnica, BST) and the Medical Library (Biblioteca di Area Medica, BAM). On the other
hand, each of these libraries is still divided physically into several sections, each with a
different tradition of shelving based on local schemes. While shelfmarks are being
progressively converted to a DDC-based system, librarians still have to manage many old
shelfmarks belonging to old schemes. The subdivisions often depend on historical or
accidental factors, such as the actual position of departments in the town, rather than on
subjects themselves; for example, most physics and chemistry books belong to BDS, but
most engineering and mathematics ones belong to BST. Books on related subjects, or even
on the same subjects, are often shelved in different places — a potential source of
confusion for users.</p>
      <p>In this situation, a standard classification scheme can work as a virtual bridge between
different local schemes and locations. It can also be a useful organizational tool in the
eventuality of a further unification between small sections of the same library (books will
find the right position in the new shelf among other books of the same subject).</p>
      <p>
        A known limitation of such enumerative classification schemes as DDC is that they
force every item to be shelved under a specific discipline, thus losing information on
related subjects also touched upon in the same work. For example, many books owned by
our libraries deal with mathematical subjects applied to physics, or with physical subjects
applied to engineering, or engineering subjects applied to building, or building subjects
applied to architecture… Navigation across subjects and disciplines is only possible if
appropriate links exist in the catalogue between related subjects [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which is hardly the
case with most Italian catalogues [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Relationships between Classes</title>
      <p>In order to overcome these difficulties, we have developed SciGator, a freely accessible
web interface [http://www-dimat.unipv.it/biblio/deweye.php] written in
PHP, which allows users to browse and navigate between subjects available in the three
scientific libraries before launching their search in the catalogue. SciGator data are stored in
a MySQL table including fields for DDC notation, informal Italian caption, informal
English caption, scope notes, related DDC classes, and equivalent non-DDC classes from
local schemes.</p>
      <p>While the fields for related DDC classes allow to manage navigation between different
disciplinary hierarchies, thus linking e.g. 532 fluid mechanics with 627 hydraulic
engineering, the fields for equivalent classes allow to manage mapping between local
schemes. DDC thus can work as a standard common language for distributed resources
[34], besides providing a language-independent notation that can organize resources with
titles in different languages. In some cases mappings can be quite precise, like it happens
with the local scheme of the Mathematics section, as this was originally based on the
Mathematics Subject Classification which defines mathematical subjects in a quite precise
way, for which DDC has good correspondence. In other cases, however, mapping is only
approximated, due to various inconsistencies in classing books in the past at various
degrees of detail in different sections. Still, using a standard classification scheme as a
common reference brings some order in the complex of available materials.</p>
      <p>In our approach, a class can be related to more than one other class. E.g., for DDC 532
fluid mechanics we have recorded “see also” relationships to 530.42 liquid state physics
and to 627 hydraulic engineering, as well as mappings to ZA.4 fluid mechanics at the
Mathematics department, ID4 fluid mechanics at the Engineering department, etc.</p>
      <p>In most cases, these relationships are symmetrical, that is 530.42 liquid state physics
also links back to 532 fluid mechanics. However, for some approximated mappings, such
as between 532 and ID4, in order to reduce noise in retrieval the links only work in one
direction: users browsing physics classes in DDC are warned that an (approximately)
equivalent local class exists, as this is displayed, but such class is not included in expanded
search (see below). Practical experience with such situations could lead to develop a more
accurate model distinguishing between several types of associative relationships.
4</p>
    </sec>
    <sec id="sec-3">
      <title>The SciGator Interface</title>
      <p>As users access SciGator homepage, they are presented with the first two hierarchical
degrees in the scheme, only for those DDC classes for which our libraries actually own
some books (Fig. 1). For example, religion classes (200) are not displayed as our scientific
libraries do not own any significant number of documents concerning religion.</p>
      <p>
        Users can select a second-order class, e.g. 530 physics, which is then expanded to show
all its subclasses and related classes (Fig. 2). This allows for classical browsing of the
hierarchical classification tree. Navigation can go down to special classes with several
additional digits, or go up back to more general classes by two degrees every click; the
interface is designed so that navigation only requires a limited number of clicks, without
forcing the user to go down or up step by step (as it happens in WebDewey) which in our
experience would make exploration too long and uncomfortable. Also, displaying an
appropriate number of adjacent and subordinated classes is important to make the scope
and context of a class more immediately clear [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Related classes are displayed in two ways:
1. their notation only is displayed on the right of each class (Fig. 2);
2. both their notation and caption are displayed in the bottom section of the page
(Fig. 3).</p>
      <p>Icons on the right of each class launch searches in the university catalogue (Fig. 4), with
a default selection of resources owned by at least one of the three scientific libraries. There
are three icons which allow for increasingly comprehensive searches. Their function is
illustrated in a short help text on the bottom of the same page:
(A) “browse this shelf” (icon with black book spines) retrieves all and only the documents
having a shelfmark that begins with the corresponding notation (notice that default
truncation is applied, so that its subclasses will also be included. This is a standard
application of the hierarchical structure of classification schemes with an expressive
notation, like Dewey);
(B) “browse the catalogue” (icon with a lens and a blue list of records) retrieves all the
documents having the corresponding notation as their shelfmark or as subject
metadata. This is useful to cover documents not yet shelved by Dewey, as they are
shelved by old local schemes, as well as documents shelved under a different Dewey
class though also indexed by the present class;
(C) “expand in the catalogue” (icon with four blue divergent arrows) retrieves all the
documents in B plus documents shelved or indexed by related classes, including both
associated Dewey classes and equivalent classes in local schemes. This icon is only
shown for those classes which actually have some related or equivalent class in the
system. Fig. 4 shows results of an expanded search for class 532 fluid mechanics
which also include books shelved under 627 hydraulic engineering and under ZA.4
fluid mechanics at the Mathematics department.</p>
    </sec>
    <sec id="sec-4">
      <title>Query Expansion Covering Related Classes</title>
      <p>Functionality C is the most innovative feature of SciGator. It leverages the association and
equivalence relationships recorded in the MySQL table to provide a wider coverage of
subject search, thus increasing recall.</p>
      <p>Clearly, every move from A to B or from B to C will produce a greater number of
results, potentially also increasing noise, due to the well-known inverse proportionality
between recall and precision. This is why the three icons are presented one after the other.
Indeed, for those classes where the A-type query already yields a good number of results,
say 10 or more, most users will be satisfied with it without need of any greater coverage.
Results of query A can be expected to be very precise, provided shelfmarks have been
assigned with enough accuracy.</p>
      <p>
        The “zero-match problem” [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] cannot occur in SciGator because only classes for which
some material is actually owned are displayed in the browsable menu. Still, for some
classes users may be unsatisfied with such little number of results as 4 or 5, or may need a
more complete coverage for their bibliographic purposes. This should lead them to shift to
icon B or even to icon C, thus obtaining more results.
      </p>
      <p>
        On the other hand, results from expanded queries can be less precise and lead to an
opposite problem of information overload, especially for 3-digit Dewey classes which
correspond to very general concepts or in disciplines where the libraries own many
documents. To face these limitations, three strategies have been adopted:
1. the interface is designed in ways encouraging people to use the icons in the
sequence A, B, C, and is provided with warnings and explanations about what
each of them will produce. Although most users are known to pay little attention to
instructions, in time they can acquire more experience with the tool and become
aware of how it works;
2. while associative relationships between classes form a complex virtual network,
only one arch of it is considered for each node, as recommended by Tudhope et al.
[
        <xref ref-type="bibr" rid="ref7 ref8">7-8</xref>
        ]. In other words, in the case of such relationship chains as between 530.42
liquid state physics and 532 fluid mechanics, and between 532 fluid mechanics
and 627 hydraulic engineering, a search for 530.42 will be expanded to only
include 532 but not 627. Reciprocity of relationships may also be relevant to these
purposes: although relationships between a pair of classes are usually recorded in
both directions, in some cases it may be advisable to limit them to only one
direction. This is especially the case with equivalent classes in local schemes,
which may point to a roughly corresponding Dewey class while the inverse may
not be the case, as mentioned above;
3. results are sorted by descending date of publication. Most recent documents are
thus displayed first and may provide a quick relevant answer to the user needs,
without having to examine the totality of results when their list is very long, thus
exceeding futility point [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Ideally, sorting could also be based on relevance by
listing results of A before results of B, then results of C, in a similar way as with
the “double query method” [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]; however this would require a level of technical
integration between SciGator and the union catalogue that is not currently
available.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Future Development</title>
      <p>SciGator still is a developing tool, and improvements may be needed in various details
concerning both the Web interface and the scripts. Until now, testing has been performed
more by librarians than by library users. One advantage of adopting a home-made tool is
that it can be continuously tuned through feedback from the front desk to the cataloguing
office and the web page developer. Such kind of integration between the different library
services is recommended since the times of Ranganathan. The authors of this paper are
involved in several of these services at the same time and have frequent first-person
exchanges with colleagues, which allows for quick decisions and corrections.</p>
      <p>
        One component that is going to be developed further is the integration of SciGator with
signs and instruction at the library shelves. Fabbrizzi [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] recommends that shelves provide
users with information about the context of every Dewey class within the whole
classification scheme, and that this be connected appropriately with the catalogue. In Pavia,
we have provided shelves with some basic illustration of the classification scheme and with
signs reporting both notation and the corresponding caption for all classes of reasonable
generality. This is now being further improved by providing dynamic links to SciGator
itself, through production of QR codes linking to the URL of a specific Dewey class in
SciGator, thus suggesting its position in the general scheme of knowledge adopted in the
library. Such development may be especially useful as a large proportion of university
students nowadays is provided with a smartphone accessing university wifi connection to
the Internet.
      </p>
      <p>Having adopted an open web interface with explicit dynamic addresses clearly is an
advantage in implementing this, as a QR code can represent a dynamic URL corresponding
to a single DDC class. For now, the project does not involve publication of any linked open
data. Ideally this would be possible for both bibliographic metadata in the union catalogue
and DDC classes; locally-selected relationships between classes, including see-also
relationships of various types, could be represented in SKOS or OWL. In practice,
however, this is inhibited by current policies and priorities at the level of the union
catalogue development, and by copyright restrictions for DDC. The dewey.info service
that made DDC classes available as linked open data has unfortunately been discontinued
by OCLC in June 2015 without any further explanation.</p>
      <p>Although precise data on SciGator use are not available, some estimations can be made
based on both log files and everyday experience at the library desks. In log files it is
unfortunately not easy to identify the exact proportion of sessions coming from automatic
crawlers of search engines. Estimates can be done by only considering IP addresses
belonging from the local area network of the university, although this misses such genuine
remote users as students accessing the system from home. In a three months period, from
March to May 2016, we had 260 accesses from IP addresses of the university local
network. We estimate that a relevant part of them are by library cataloguers using SciGator
as a reference in the process of shelving (another important application of this tool), or staff
using SciGator for quick reference at the front desk. Direct access by library users is rare
for now, as both the tool and the methodology of subject search are not very popular among
them yet. This situation should be improved by strengthening both information literacy
course offer (which our libraries already provide, though focusing on different services)
and visibility of the link to SciGator in the context of the big university libraries website, on
which we only have partial control.</p>
      <p>We believe that our experience can show how KOSs, and classification schemes in
particular, have the potential to provide more powerful search tools than is currently the
case in most information services, provided there is enough investment it them, in terms of
time for indexing, of interface programming, and of service promotion among users.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Gnoli</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De</surname>
            <given-names>Santis</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Pusterla</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          : Commerce, see also Rhetoric:
          <article-title>Cross-Discipline Relationships as Authority Data for Enhanced Retrieval</article-title>
          . In: Slavic,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Cordeiro</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.I</surname>
          </string-name>
          . (eds.)
          <source>Classification &amp; Authority Control: Expanding Resource Discovery</source>
          , pp.
          <fpage>151</fpage>
          -
          <lpage>162</lpage>
          . Ergon,
          <string-name>
            <surname>Würzburg</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Casson</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fabbrizzi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Slavic</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Subject Search in Italian OPACs: an Opportunity in Waiting</article-title>
          ? In: Landry,
          <string-name>
            <surname>P.</surname>
          </string-name>
          et al. (eds.), Subject Access:
          <article-title>Preparing for the Future</article-title>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>50</lpage>
          . De Gruyter, Berlin (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Si</surname>
            ,
            <given-names>L.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Brien</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Probets</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Integration of Distributed Terminology Resources to Facilitate Subject Cross‐Browsing for Library Portal Systems</article-title>
          .
          <source>Aslib Proc</source>
          .
          <volume>62</volume>
          ,
          <fpage>415</fpage>
          -
          <lpage>427</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Boer</surname>
          </string-name>
          , V. de: Connecting Collections across National Borders. http://www.victordeboer. com/digital-humanities/sound-and
          <article-title>-vision/connecting-collections-</article-title>
          <string-name>
            <surname>across -</surname>
          </string-name>
          national-borders/ (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Fabbrizzi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>An Atlas of Classification: Signage between Open Shelves, the Web and the Catalogue</article-title>
          .
          <source>JLIS.it 5</source>
          ,
          <issue>2</issue>
          ,
          <fpage>101</fpage>
          -
          <lpage>122</lpage>
          (
          <year>2014</year>
          ), http://leo.cineca.it/index.php/jlis/article/view/ 10068
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Tudhope</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Binding</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Faceted Thesauri. Axiomathes,
          <volume>18</volume>
          ,
          <issue>2</issue>
          ,
          <fpage>211</fpage>
          -
          <lpage>222</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Tudhope</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Augmenting Thesaurus Relationships: Possibilities for Retrieval</article-title>
          .
          <source>J. Digital Info. 1</source>
          ,
          <issue>8</issue>
          (
          <year>2001</year>
          ), https://journals.tdl.org/jodi/index.php/jodi/ article/view/181/160
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Tudhope</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Binding</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blocks</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cunliffe</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Query Expansion via Conceptual Distance in Thesaurus Indexed Collections</article-title>
          . J. Doc.,
          <volume>62</volume>
          ,
          <issue>4</issue>
          ,
          <fpage>509</fpage>
          -
          <lpage>533</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Zach</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>When is “Enough” Enough? Modeling the Information-Seeking and Stopping Behavior of Senior Arts Administrators</article-title>
          .
          <source>J. Am. Soc. Info. Sci. Techn</source>
          .
          <volume>56</volume>
          ,
          <issue>1</issue>
          ,
          <fpage>23</fpage>
          -
          <lpage>35</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gnoli</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cheti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Sorting Documents by Base Theme with Synthetic Classification: the Double Query Method</article-title>
          . In: Slavic,
          <string-name>
            <surname>A.</surname>
          </string-name>
          et al. (eds.),
          <article-title>Classification &amp; Visualization: interfaces to knowledge</article-title>
          , pp.
          <fpage>225</fpage>
          -
          <lpage>232</lpage>
          . Ergon: Würzburg (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>