<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ARC: appmosphere RDF Classes for PHP Developers</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>appmosphere web applications</institution>
          ,
          <addr-line>Kruppstr. 100, 45145 Essen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>ARC is an open source collection of lightweight PHP scripts optimized for RDF development on hosted Web servers. It currently consists of a non-validating RDF/XML parser, an N-Triples Serializer, and a "Simple Model" class providing common methods for working with resource descriptions. The three main classes are stand-alone, single-file scripts, thus facilitating the bundling with existing PHP-based applications. By partly using arrays instead of objects, ARC offers speed improvements compared to toolkits that follow approaches completely based on PHP objects.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Benjamin Nowack</title>
      <sec id="sec-1-1">
        <title>1 Motivation</title>
        <sec id="sec-1-1-1">
          <title>2.1 ARC RDF/XML Parser</title>
          <p>The RDF/XML Parser class can be used to parse a passed string, to load and parse
data from the local machine, or to load and parse data from the Web. The built-in
Web reader does not follow HTTP redirects yet, but it accepts several parameters to
e.g. work with proxies, send custom HTTP headers, or limit the number of lines to
1 ARC stands for appmosphere RDF classes. The classes and documentation are available
online at http://www.appmosphere.com/en-arc
parse. Additionally, the class can be configured to retrieve content-negotiated
RDF/XML (via an HTTP Accept header) , or to keep the raw RDF/XML data and/or
response headers of the remote server in memory. The code snippet below shows a
usage example.</p>
          <p>/* class inclusion */
include_once("ARC_rdfxml_parser.php");
/* class configuration (optional) */
$args=array(
"encoding"=&gt;"auto",/* auto-detect UTF-8 etc */
"proxy_host"=&gt;"192.168.27.1",
"proxy_port"=&gt;8080,
"user_agent"=&gt;"myParser v1.0"
);
/* instantiation */
$parser=new ARC_rdfxml_parser($args);
/* parsing a file from the Web */
$url="http://www.example.com/data.rdf";
$result=$parser-&gt;parse_web_file($url);
if(is_array($result)){</p>
          <p>echo count($result)." triples found";
}
else{</p>
          <p>echo "couldn't parse ".$url.": ".$result;
}
The parser does not check the parsed code for complete RDF-validity, only XML
errors and some very basic RDF/XML syntax [2] errors are detected. The result of the
parsing process is either an error message or an array of triples.</p>
        </sec>
        <sec id="sec-1-1-2">
          <title>2.2 ARC N-Triples Serializer</title>
          <p>The serializer class generates an N-Triples [3] string from an array of triples. The
structure of the passed triple array has to conform to the following ARC triple array
structure:</p>
          <p>Each entry is an associative array with keys s, p, and o:
- s (a subject node array)
- p (a predicate URI string)
- o (an object node array)</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>A subject node array may have the following keys: - type ("uri" or "bnode") - uri (if type is "uri") - bnode_id (if type is "bnode")</title>
      <p>An object node array may contain additional key-value pairs:
- type ("uri", or "bnode", or "literal")
- uri (if type is "uri")
- bnode_id (if type is "bnode")
- val (if type is "literal")
- dt (datatype URI if available and type is "literal")
- lang (language code if available and type is "literal")
The serializer provides only a small number of configuration options. It is possible to
set the linebreak and the space characters as illustrated below.</p>
      <p>/* class inclusion */
include_once("ARC_ntriples_serializer.php");
/* class configuration (optional) */
$args=array("linebreak"=&gt;"\n", "spacer"=&gt;"\t");
/* instantiation */
$ser=new ARC_ntriples_serializer($args);
/* displaying the N-Triples code */
echo $ser-&gt;get_ntriples($triples);</p>
      <sec id="sec-2-1">
        <title>2.3 ARC Simple Model</title>
        <p>ARC's Simple Model class indexes a triple array and offers methods to e.g. return a
list of resources, find resources based on a provided identifier, parse RDF lists, or
extract property values. A detailed usage example is given in section 3.</p>
        <sec id="sec-2-1-1">
          <title>3 Use cases</title>
          <p>By combining the RDF/XML Parser (or an adjusted triple array queried from a basic
RDF triple store) with the Simple Model class, it is easily possible to generate HTML
views from RDF. Figure 1 shows a resource description summary which could be
created by passing two Simple Model instances (one with vocabulary information, and
one with data about individuals) to an HTML rendering script. The latter is not
included in ARC but the PHP code snippets below Fig. 1 exemplify how data can be
pre-processed by ARC in order to retrieve grouped property values and label
information.
/* abbreviations */
$rdfs="http://www.w3.org/2000/01/rdf-schema#";
$foaf="http://xmlns.com/foaf/0.1/";
/* vocabulary model */
$v_triples=$parser-&gt;parse_web_file($foaf);
$model_args=array(
"triples"=&gt;$v_triples,
"ns_abbrs"=&gt;array($foaf=&gt;"foaf", $rdfs=&gt;"rdfs")
);
$v_model=new ARC_simple_model($model_args);
/* data model */
$d_url="http://www.example.com/my_foaf.rdf";
$d_triples=$parser-&gt;parse_web_file($d_url);
$model_args["triples"]=$d_triples;
$d_model=new ARC_simple_model($model_args);</p>
          <p>ARC: appmosphere RDF Classes for PHP Developers 5
PHP code to find a foaf:Person resource in a foaf:PersonalProfileDocument:
/* find PPD in data model */
$ppd_qname="foaf:PersonalProfileDocument";
if(!$ppds=$d_model-&gt;get_resources($ppd_qname)){
return false;
}
$ppd=$ppds[0];
/* get person identifier via primaryTopic */
if(!$id=$d_model-&gt;rpv2($ppd, "foaf:primaryTopic")){
return false;
}
$person=$d_model-&gt;get_resource($id);
PHP code to display labeled property values:
}
}
}</p>
          <p>}
/* name value */
$name_val=$d_model-&gt;rpv($person, "foaf:name");
if(!$name_val){</p>
          <p>/* try alternative naming properties */
/* mbox values */
$cur_vals=array();
if($cur_props=$person["props"]["foaf:mbox"]){
foreach($cur_props as $cur_prop){</p>
          <p>$cur_vals[]=$cur_prop["val"];
/* mbox label (use vocabulary model) */
if($terms=$v_model-&gt;get_resources("foaf:mbox")){
$term=$terms[0];/* an rdf:Property instance */
$cur_label=$v_model-&gt;rpv($term, "rdfs:label");
/* display */
echo '&lt;h2&gt;'.$name_val.'&lt;/h2&gt;';
echo ($cur_label) ? $cur_label : "mbox";
echo "- ".join("&lt;br /&gt;- ", $cur_vals);
2 "rpv" is an alias for the method "get_resource_prop_val" which retrieves a single property
value for a given resource.
Further use cases include reading, importing, and displaying of RSS 1.0 feeds, or any
other RDF/XML-compatible data. At the moment, ARC is best suited for rapid RDF
development or as a small addition to existing codebases, but the classes have already
been successfully tested in larger projects (e.g. the new SemanticWeb.org portal) as
well.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>4 Performance</title>
          <p>RAP is currently the only other toolkit written completely in PHP (which is a
prerequisite to be deployable to cheap hosted Web server environments) that
conforms to the revised RDF specifications [4]. The performance tests done for this
paper are limited to comparing the RDF/XML parsers. A third option (w.r.t. to
parsing RDF) is Morten Frederiksen's SimpleRdfParser [5], a wrapper class for RAP
that uses arrays instead of RAP's internal objects; it was tested as well. The Redland
Application Framework [6] also provides bindings for PHP [7]. However, the PHP
bindings can neither be installed on average hosted Web servers nor on the Windows
machine that was used for the performance tests3. Redland was not included in the
benchmarks. It would have been an interesting reference, though.</p>
          <p>The tests were run on a desktop PC (Pentium IV 1.8 GHz, Windows 2000), using
PHP 4.3.6, and the xdebug extension [8] for profiling. The long execution times in the
benchmark results are caused by having to disable Zend optimizer while using the
profiling extension. Switching back from xdebug to the Zend optimizer accelerates
the parsers by a factor of 4.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Parsing RDF/XML Files</title>
        <p>ARC (v0.2.0), SimpleRdfParser, and RAP (v0.9.1) were tested with 3 different files:
1. An RSS 1.0 news feed (338 triples)
2. The W3C RDF test cases manifest file4 (2160 triples)
3. A large FOAF file (8601 triples)
RDF files beyond this size are usually stored in a database, these tests cover common
in-memory parsing tasks only. As the SimpleRdfParser can't load files directly, each
test passes the RDF/XML code as a single string variable to the parser.
3 Installing the bindings was not possible with the existing PHP engine. It would have been
neccessary to re-compile the PHP sources.
4 http://www.w3.org/2000/10/rdf-tests/rdfcore/Manifest.rdf
The benchmark results6 show that using the SimpleRdfParser is about two to four
times faster than building a complete in-memory model via RAP. ARC is about twice
as fast as the SimpleRdfParser. The difference between using PHP objects vs. arrays
becomes more obvious with a growing number of triples in a model. While the
SimpleRdfParser is 1.5 times faster than RAP when parsing the smallest RDF file, it
is 3.7 times faster at parsing the 2160 triples, and about 5 times faster when
processing the large file.</p>
        <p>These tests don't consider that RAP is not only parsing but also building indexes
and validating the parsed code. But it could still make sense to examine the xdebug
results in more detail in order to see if RAP may be partly made faster. ARC's speed
could for example be improved by working around PHP's automatic type conversion
mechanism (e.g. by using "===" instead of "==") and by applying simple string
functions such as strpos instead of regular expressions.</p>
        <sec id="sec-2-2-1">
          <title>5 Conclusion and Next Steps</title>
          <p>ARC is a lightweight alternative (or addition) to RAP's feature-rich API when only
limited functionality such as parsing or displaying is required. Being open source and
written entirely in PHP, the RDF classes can be bundled with PHP frameworks and
used in shared Web hosting environments. The advantage of using PHP arrays instead
of object structures increases with the size of parsed RDF/XML documents.
The main idea of ARC is to provide lightweight RDF classes. However, several
options have already been added to ARC which might not be needed for certain use
cases. It is planned to build a simple parser generation service which will allow
developers to tailor the ARC scripts to their needs. It will be possible to e.g. exclude
the method for using an HTTP socket instead of PHP's one-line fopen function, or
to automatically remove comments. Together with other configuration options the
parser's size could be reduced to 20KB or less.
5 xdebug had to be disabled for this test as it needed to much main memory to profile RAP
6 The complete benchmark details can be downloaded from</p>
          <p>
            <xref ref-type="bibr" rid="ref6">http://www.appmosphere.com/2005</xref>
            /php_parser_benchmarks_2005_04_15.pdf
Other considerations include extending the parser with more validation features,
making the Web reader follow HTTP redirects, or adding an RDF store interface to
the ARC collection. A basic SPARQL [9] implementation is already available [10].
1.
          </p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oldakowski</surname>
          </string-name>
          , R.:
          <article-title>RAP: RDF API for PHP</article-title>
          . Berlin (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          http://www.wiwiss.fu-berlin.de/suhl/radek/pub/RAP-oldakowski.
          <source>pdf Beckett</source>
          , D.:
          <article-title>RDF/XML Syntax Specification (Revised)</article-title>
          .
          <source>W3C</source>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          http://www.w3.org/TR/rdf-syntax-grammar/ Grant,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Beckett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.: RDF Test</given-names>
            <surname>Cases. W3C</surname>
          </string-name>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          http://www.w3.org/TR/rdf-testcases/ Resource Description Framework (RDF).
          <source>W3C</source>
          (
          <year>2004</year>
          ). http://www.w3.org/RDF/ Frederiksen, M.:
          <article-title>Easy RDF-parsing with PHP</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          http://www.wasab.dk/morten/blog/archives/2004/05/31/easy-rdf
          <article-title>-parsing-with-php Beckett, D.: Redland RDF Application Framework</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          http://librdf.org/docs/php.html xdebug. http://xdebug.org/ Prud'hommeaux, E.,
          <string-name>
            <surname>Seaborne</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>SPARQL Query Language for RDF</article-title>
          .
          <source>W3C</source>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          http://www.w3.org/TR/rdf-sparql-query/ Nowack,
          <string-name>
            <given-names>B.: ARC SPARQL</given-names>
            <surname>Parser.</surname>
          </string-name>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>