<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sven Abels</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Axel Hahn</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>This short paper gives an introduction into electronic product catalogs and classification systems such as eCl@ss or UN/SPSC used in this domain. The role of classification systems to foster interoperability of catalog based enterprise systems is explained as well as problems that occur. Afterwards an approach is presented that aims in compensating these problems with the help of a mediating system used to re-classify products.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>,QWHURSHUDELOLW\ RI FDWDORJ EDVHG V\VWHPV
Using standard catalog formats, such as BMEcat, enables an easy collaboration between
enterprises when exchanging product data. There are many standard applications that are
able to import and interpret catalog data, stored in these formats.</p>
      <p>
        Many enterprises have to integrate more then one catalog into the own system. For
example, in e-procurement systems of enterprises, designed to support the electronic
procurement of goods, products from a large number of suppliers are integrated into one
system (see [
        <xref ref-type="bibr" rid="ref2">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">4</xref>
        ]). Other companies might have to offer products from multiple
suppliers in an own web-shop.
      </p>
      <p>There are several serious problems when integrating catalogs from more then one
supplier into the own system. Basically, there are two major problems:</p>
      <p>A different syntax and semantic of the data model (e.g. BMEcat vs. xCBL)
Different taxonomies and terminologies of the catalogs itself
The first problem is the usage of different catalog formats, which are incompatible. For
example, it might be possible that a supplier offers his product in xCBL while the
vendor’s system expects it to be in the BMEcat format. An appropriate solution for this
problem is to develop a simple converter that performs a conversation of the xCBL
catalog into BMEcat. The second problem is, however, much more complicated then the
first one and it is independent from the catalog format. The problem is that each product
catalog might have its own product groups to arrange products. Basically, they differ in
(i) their taxonomy, e.g. by having different subgroups for a category ‘paper’, (ii) their
terminology, e.g. by using ‘paper’ and ‘writing material’ for an identical category and
(iii) their language or spelling.</p>
      <p>
        In order to solve those problems, classification systems were defined. A classification
system is used „to assign each product to a product group corresponding to common
attributes or application areas“[
        <xref ref-type="bibr" rid="ref5">6</xref>
        ]. Popular classification systems are eCl@ss [
        <xref ref-type="bibr" rid="ref6">7</xref>
        ] or
UNSPSC [
        <xref ref-type="bibr" rid="ref7">8</xref>
        ]. Those systems offer a set of categories (“classes”), which are ordered
hierarchically. A product can easily be assigned to a category by adding the category
string to its product data. Classification systems can help to integrate products into
existing catalogs and systems. There are, however, serious problems that prevent
interoperability of different catalog based systems although a common catalog format
was chosen and although classification systems were used. These problems are caused by
the different standards in the domain of classification systems because there are several
classification systems in this domain, which are incompatible. Hence, a reclassification of
product data is necessary, whenever two e-commerce systems are using different
classification systems, i.e. a conversation from eCl@ss information into UNSPSC values.
      </p>
      <p>
        5HFODVVLILFDWLRQ RI SURGXFW GDWD
Re-classifying product data as explained in the last section is not an easy task because a
simple mapping between the categories of both classification systems is not possible in
many cases (see e.g. [
        <xref ref-type="bibr" rid="ref8">9</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">10</xref>
        ]). For example the first classification system might have a
category called 3DSHU in the main category RIILFH PDWHULDOV The destination system
might now need an additional break down into :KLWH 3DSHU, 5HF\FOHG 3DSHU, etc. Hence,
additional information is needed to re-classify all data correctly.
      </p>
      <p>When looking at related problems, we can identify two related research areas:
1.
2.</p>
      <p>
        Model transformation approaches, used to transform different models (c.f. [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ]).
Typical classification approaches such as Bayes or a Vector-based classification
[
        <xref ref-type="bibr" rid="ref11">12</xref>
        ].
      </p>
      <p>
        Applying model transformation approaches for the reclassification is in most cases not
enough because in these approaches, only the model itself is considered. In many cases,
the models of different classification systems are almost identical or at least very similar
but their contents such as, e.g., the name of the categories, differ completely.
Many typical (“ traditional”) classification approaches fail in the area of (re-)classifying
product data since there are many different and similar classes (eCl@ss has over 24000
different classes). Existing solutions try to analyze products descriptions to extract
keywords which are used to assign a product to a class. An example is given in [
        <xref ref-type="bibr" rid="ref11">12</xref>
        ]
where Ding et al. indicate, to achieve a precision of 78% with one a Naïve-Bayes
classification to classify 40% of the products, while they used the other 60% as training
data for the algorithm. Existing solutions are designed for the classification of product
data only. The authors argue that within the reclassification of product data, a number of
additional information can be considered which can significantly improve existing
approaches.
      </p>
      <p>For example a product might be stored in an electronic catalog within a group “ office
material” with the name and description “ Writingstar 4000+, 80g”. Without looking at
existing classification information, it will be hard to classify this product into e.g. the
eCl@ss system. This task is much easier if existing classification information is
interpreted. In the given example, there might be the UNSPSC code 14111511, which
stands for ‘writing paper’ .</p>
      <p>&amp;RQFHSWXDOL]DWLRQ
The authors propose a mediator-like system to re-classify product data with respect to
existing classification information. This system is supposed to modify a product catalog
and re-classify all product data before the catalog is forwarded to the destination
enterprise system (e.g. to an e-procurement system).</p>
      <p>The reclassification process is to be performed in two steps.</p>
      <p>1.</p>
      <p>The first step is performed by analyzing existing classification information and
building a set of classes that could be chosen for the product in the new
classification system. Since most classification systems are hierarchically
ordered, it is in most cases easily possible to find such a set of matching classes.
For example, if the existing classification information is called “ writing paper”
with the parent class “ paper”, then the new classification system will be
searched for classes, which contain “ writing paper”. Those classes are added to
a set of possible results. If no class was found, then all classes containing
“ paper” will be added.</p>
      <p>
        The second step is to narrow down the set of possible classes by analyzing the
product’ s description. This is performed in a similar way of existing
classification solutions, which means that the product description is analyzed
and keywords are extracted. Many approaches are using a machine learning
approach to enrich their data with new keywords once, a product was classified
correctly. A detailed description of such a classification is given in [13] or [
        <xref ref-type="bibr" rid="ref11">12</xref>
        ].
The suggested approach dramatically cuts down the number of categories that have to be
analyzed in the analysis process. The following figure shows the suggested procedure
graphically.
      </p>
      <p>Import of product data
and classification
specifications</p>
      <p>Creation of a set of
possible categories by</p>
      <p>analyzing existing
classification information
Enrich ontology used for the
analysis in the last step</p>
      <p>Cleaning data (removal
of stopwords, stemming</p>
      <p>words)
Analyzis of product
description to find the
most probable class
)LJXUH</p>
      <p>Considering existing classification information
The white ellipse show necessary steps for classifying products while the hatched ellipse
shows the reclassification step, which filters the set of classes that can be chosen to
classify the product. Without this step the analysis process would have to select a class
from all possible classes, which is usually a high amount of categories, e.g. eCl@ss
contains over 24.000. First tests have shown that the use of existing classification
information can cut down this list to filter between 95% and 98% in good scenarios.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.</given-names>
            <surname>Patankar</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Segev: An Extensible Catalog Management Framework, Fisher Center for Information Technology</article-title>
          and
          <string-name>
            <given-names>Marketplace</given-names>
            <surname>Transformation</surname>
          </string-name>
          , Working Paper,
          <year>2002</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          3.
          <string-name>
            <given-names>T.</given-names>
            <surname>Renner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Schmitz</surname>
          </string-name>
          .:
          <article-title>Specification BMEcat, V 1.2, www</article-title>
          .bmecat.org,
          <source>last access: 01/05</source>
          , 2001
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Neef:</surname>
          </string-name>
          e-Procurement:
          <article-title>From Strategy to Implementation</article-title>
          . Financial Times Prentice Hall,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J.</given-names>
            <surname>Hentrich: B2B-Katalog-Management (E-Procurement</surname>
          </string-name>
          &amp;
          <article-title>Sales im Collaborative Business)</article-title>
          .
          <source>Galileo Business</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          6. xCBL: Structure reference,
          <source>Version</source>
          <volume>4</volume>
          .0, http://www.xcbl.org/xcbl40, Last access:
          <volume>12</volume>
          /04, 2003
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          7.
          <string-name>
            <given-names>J.</given-names>
            <surname>Leukel</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Schmitz</surname>
          </string-name>
          , F.-D. Dorloff:
          <article-title>Modeling and Exchange of Product Classification Systems using XML, in:</article-title>
          <source>Proceedings of the 4th IEEE International Workshop on Advanced Issues of E-Commerce and Web-based Information Systems</source>
          ,
          <year>2002</year>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          8. eCl@ss: eCl@ss White Paper,
          <source>V0.6</source>
          ,
          <year>2001</year>
          , http://www.eclass.de,
          <source>last access: 17th Sept</source>
          .
          <year>2004</year>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          9. UNSPSC:
          <article-title>Why Coding and Classifying Products is Critical to Success in Electronic Commerce, Using the UNSPSC</article-title>
          ,
          <string-name>
            <surname>White</surname>
            <given-names>Paper</given-names>
          </string-name>
          , Granada Research. 2001
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          10. E. Schulten et al. :
          <article-title>The E-Commerce Product Classification Challenge. IEEE Intelligent Systems Magazine, special issue on Intelligent E-business, (July/August</article-title>
          <year>2001</year>
          ),
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          et al:
          <article-title>The role of ontologies in e-Commerce</article-title>
          . In: S.Stab &amp; R. Studer (eds.) Handbook on Ontologies,Springer,
          <year>2003</year>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          12. N. Silva; J.
          <string-name>
            <surname>Rocha: E-Business</surname>
          </string-name>
          <article-title>Interoperability through ontology semantic Mapping. E-business and</article-title>
          <string-name>
            <surname>Virtual Enterprises</surname>
          </string-name>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          et al. :
          <article-title>GoldenBullet: Automated Classification of Product Data in E-commerce</article-title>
          .
          <source>In: Proceedings of Business Information Systems</source>
          <year>2002</year>
          (
          <article-title>BIS2002), Ponznan</article-title>
          , Poland,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          15.
          <string-name>
            <given-names>S.</given-names>
            <surname>Abels</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Hahn: Conception of a framework for the combination of heterogeneous methods for re-classifying product data in the e-business domain</article-title>
          .
          <source>In: Proceedings of the 26th McMaster World Congress. Hamilton</source>
          , Canada.
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>