<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Web Forms and XML Processing: Some QnaRty Factors of Process and Prodnct Mo Amado Alves2 Faculdade de Ci8ncias e Tecnologia da Universidade Nova de Lisboa maa@di.fct.unI.pt</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>$ Centre for Artificial Intelligence, Universidade Nova de Lisboa</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1999</year>
      </pub-date>
      <volume>25</volume>
      <issue>3</issue>
      <fpage>11</fpage>
      <lpage>16</lpage>
      <abstract>
        <p>"The web is bad; really bad."--observed Jakob Nielsen threeyears ago about The Web Usage Paradox [1]. The true paradox today is that science and technology institutions have bad sites! Programrnes devotedto theinformation socioty itself have bad sites!! Hopefully this international conference will make a difference. At least show Nielsen and myself are not just fools on the hill" and, hopefully, help management make wiser decisions regarding web development strategies and, correlatively, better technical staff selection. The currentpaper contributesto this goal by exposing a methodfor the developmentof complex web services which embodies a numberof quality assurance items and has passed the test of real web service deployment, featuring user authentication, multiple forms,recorded data,and automatedpage creation. The method and the case are described with incursions into selected technical details, Quality factors are explicitly orimplicitly associatedwith each described item.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Web QuaKty Mauifesto</p>
      <p>Theweb is bad; really bad.[1]</p>
      <p>And it is getting worse. Already three years have
passed since web usage specialist Jakob Nielsens
article [I] has appeared, and still his observations are
right on the mark:"90% of all commercial websites are
overly difficult to use due to b/oated page design that
takes forever to download, internally focused design
that hypes products without giving real info, obscure
site structures, Lack of navigation support, narrative
writing style optimised for print,not for the way users
read online, etc." ([I] abridged, original emphasis
maintained).</p>
      <p>In the currentpaper I add a couple of items to this
list.</p>
      <p>Why is it even worse, today? Well, for one,
institutional sites are bad. In fact, the true paradox
today is that scz.enceand technology institutions have
bad sites! Research prograrnrnes devoted to the
information society itself have bad sites!! The 2000
Olympics site was bad. Oracle'ssite is bad. I submit a
Law of Inverse Quality: the greater the institution,
worse the site.</p>
      <p>I hope this international conference-in particular
panel PNI.2 entitled Qualidade nos Sistemas de
InformaVdo da Administrado PAbIica.. o Inlc!o duma
Cruda (Qualz'ryin Public Administration Information
Sysrems..the Start of a Quest), in regard to institutional
sites--win make a difference. At least it will show
Nielsen and myself are not simply fools on the
hill"not anymore. And, hopefully, it will help educated
management make wiser decisions regarding web
development strategies-and, correlatively, better
technical staff selection.</p>
      <p>The current paper contributes to this honourable
goal by exposing a method for the development of
complex web services which embodies a number of
quality assuranceitems and has passed the test of real
service case deployment.</p>
      <p>Is the Web really worse today, rather than three
years ago? Yes, definitely. Here is a recent (2000)
observation from the same author of [1]: "If you are
going to go and buy something on a new website, you
will fail. If you go to a new website, you will not be
able to use it.
(http://www.wired.com/newsibusiness/O,1367,40155,00
,html)</p>
      <p>Web quality is a twofold problem: technical and
social.</p>
      <p>Technical A veritable plethora of techniques and
methods exists today to develop web services. Judging
from the results, most of them are bad. The current
paper presents a method that emphasises some quality
factors of process and product. These factors are
explicitly or implicitly associated with each described
technical item. Bottom line: it is a good method. It is
proven. The rest of this paper will deal with the
technical aspect only.</p>
      <p>Social. The social problem is to convince people to
use good methods and techniques. To be quality-aware.
People like Jakob Nielsen [I] is trying to pass the
message for some years now. The message is simple:
User: demand web quality Web service provider,
provz"dqeuaLz.t(yor eLsedz.e).But seemingly the word is
not getting through. Users are not demanding.Perhaps
they simply do not know the Web could be much better.
Perhaps they simply do not want to: one way for a site
to be better is to be simpler; perhapsmost users prefer
complicated, slow sites. This social aspect is not
addressedfurd1erin the currentpaper.
2. Qullity Faaors Overview</p>
      <p>HTTP, CGI are the GOTOs of the 1990s.</p>
      <p>[21</p>
      <p>We present a method for the development of
complex web services. The method was tested with a
service case featuri02:
. user authentication
. multiple forms
. recorded data
. automated page creation</p>
      <p>The method emphasises quality at two stages:
development and execution (meaning runtime execution
of the service). It does this by scoring high on quality
factors of process and product respectively; mostly of
product, but high scores here are justified by process
factors implicit in the method, as illustrated in Table I .</p>
      <p>Factor
Correctn
ReLiabz-Lz.ty</p>
      <p>Maz.ntaz-nab
z'LitY</p>
      <p>Justification</p>
      <p>High traceability: rich
messages. Operationalised complet
checks.</p>
      <p>All errors handled. Standard
technology. Simple page design.</p>
      <p>Good choice of programming
language (Ada). Separation of HTML
code and service logic.</p>
      <p>The productalso scores high on eciency, usability,
portability, and interoperabz.Iz.ty.It scores less on
testabz.Ltly(test data must be preparedfor each case, and
it is not operationalised), and incegn`ty (no access
control tool). These scores and their justification are
furthersupportedby the items detailed in the rest of the
paper.</p>
      <p>The method comprises selected and created "open
source" software tools and components: package CGI
by David Wheeler (modified version included in [3]),
package XML_Parser by the author{3], and GNAT by
GNU, NYU and ACT (vd. adapower.com).</p>
      <p>We use the word safety as a synonym of reLiability,
and we use the words method and safety in a wide
sense, viz. with method ranging from architectureto
coding, and safety including effectiveness and
efficiency both in development (cost safety) and
execution.</p>
      <p>In this paperthe method is presented with examples
from the real development case, and with incursions
into the detail of selected aspects.</p>
      <p>The method is continually evolving, due to both
external technological change and internal planned
increments.Some of these planned incrementsare also
exposed in this paper, as a means of obtainingfeedback
from the software engineering community. This traitin
particularputsthe method on the top level of the CMM
(CapabilityMaturity Model, vd. httfi://www.sei.cmu.edu).
Other well known software process references
associated with the current method are vanilla
frameworks, extreme programming, futurisz
programmz"ng(vd. Intemet). The precise form of these
associationsis left implicit in the paper.</p>
      <p>1. The Case</p>
      <p>The most recent application of the method was in
the implementationof an official inquiryto schools via
Internet.This was in Portugal, in the year 2000. The
purpose of the inquiry was to evaluate a recent reform
in school administration.The inquirer, and my client,
was CEESCOLA3, a state-funded research centre in
education-henceforth simply the Centre.</p>
      <p>The inquirees were 350 secondary schools and
school groups randomly chosen out of a nation-wide
universe of 1472 such entities.</p>
      <p>The service was requiredto be accessible only by
the selected schools, so these were previously given, via
surface mail, private access elements (identifier and
password). A time window for answering the inquiry
was fixed, and the system was required to make the
answers available to the Contra as soon as they were
submitted.</p>
      <p>The inquiryitseJftook the form of a numberof long
and complex questionnaires: each questionnaire had
hundreds of questions, and the answer to certain
questions determinesthe existence of other questions or
their domainof possible answers.</p>
      <p>Note that this case is very similar to electronic
commerceservices in complexity and safety issues.
2. The Method
The top-level featuresof the method are:
HTML
CGI
separationof HTML code (documents) and service
logic (program)
HTMLextendedinternally
documentspreparedthroughXML transformations
both the service logic and the transformations
writtenin Ada
session state maintainedin the served pages
a singlemeta HTML unit
a single service procedure
3 Centro de Estudos da Escola = Center fOrSchool
Studies, Facultyof Psychology and Education Sciences
of the University ofLisbon.</p>
      <p>The separation of HTML code and
service logic is a crucial design premise. Our
rationale for this cOnverges for the most part
with that described in [2]. In order to attain
separation, the stored pages are written in a
slightly extended HTML, call it meta"
HTML, which is transformed by the service,
upon each request, into the served pages in
standard HTML,</p>
      <p>Also, minirnalized HTML was preferred
as a basis for meta-HTML, because
minimalized HTML is more readable by
humans than its non-minimalized
counterpart or XETML-and the ultimate
reviewers of the meta-document are human.</p>
      <p>Now, HTML, minimalized HTML,
XHTML, and the designed meta-HTML are
all subsumed by a slightly relaxed XML, notably one
not requiring pairing end tags. This may seem
nonsensical to XML formalist eyes and sound of heresy
to XML purist ears, but in practice such a "dirty"
version of XlvIL is very convenient. With a robust
XML processor one can easily control that one dirty
aspect of not requiring pairing end tags. Package
XML__Parserhas such a robustness feature. The gains
include:
. a single processing component for all "dirty"
XML instances (HTML, minimalized HTML,
meta-HTML, XHTML)
. increased readability of the input units
. an easy path to proper XML representations (not
taken, but the current trend from HTML towards
xMI- in the Web was a concern)</p>
      <p>So, in this paper, we take the liberty of calling
simply XML to all that-and hence the pervasive use of
the term and inclusion of XML tools in the method.</p>
      <p>XML processing happens at two stages: data
preparation and service execution"</p>
      <p>Data preparation. The questiOnnaires are created
by client staff using WYSIWYG editors like Microsoft
FrontPage and Word. Then these items are transformed
into the final, static meta-HTML items. The major part
of this transformation is automated, by means of Ada
procedures utilising package XML_Parser. The
transformation consists of:
. rectify the messy HTML emitted by Microsoft
tools
. rectify and insert control elements and attributes
. structure the items into identified groups</p>
      <p>Because the necessary ad hoc transformation
programs are small (c. Ik lines), and the compiler is fast
and easily installable on any site, Ada can also be used
here, instead of the usual unsafe scripting languages.</p>
      <p>Service execu6on. The Pages are not served
directly: they have a number of markers that must be
re.p2aced by the definitive va2ues. his is done at
runfive by the main service Procedure, agam utxlxsmg
XML-Parser. The rest of this section focuses on this.</p>
      <p>Input values from one form are relayed onto the
next as hidden input elements. This provides for:
. data communication between session points, or
forms-this implements sessions
. general tests on input values to be run on any
session point-this increases safety</p>
      <p>All input values are relayed, so careful naming of
input elements is required (in order to avoid collision).
The localisation of all forms in a single meta-unit
promotes this.</p>
      <p>The method evidently relies on the usual external
pieces of web technology: an HTTP/CGI server and
web browsers. The service was deployed with the
Apache server running on a Linux system. Some
Problems were felt here, notably an access security
hole: the service internal database files, in order to be
accessible by the main procedure, had to be configured
in such a way that they were also accessib2e by all 20ca2
Linux system users! This problem is perhaps corrigible
with the Proper Apache settings; but this servers
documentation is hardly comprehensible.
4.1 The service procedure</p>
      <p>The service procedure is a non-reactive program, i,e.
it terminates, a usual CGI procedures are. It is designed
as the sequence of blocks sketched in Figure I.</p>
      <p>The computation is data-driven by form input
values, meta-HTML markers, and system database files
(users, passwords, etc,) The form input values and the
files are totally case-dependent, so we focus on the
meta-HTML markers, and dedicate the next section to
them.</p>
      <p>The exception handling is crucial. All errors are
captured in a report Page served to the user with a
wealth of useful information, including instructions for
error recovery, illustrated in Figure 2.4 This happens
4 The Original data in portuguese are shown in the
fl2ures because thev have formal indentifiers in
port-uguese (sOmetimes" in English, e.g when they
emanat-e from the compiler), an-d we wahted to ensur
referenctial consistency between all data items shown in
this paper and at its presentations.
even during development and testing, facilitating these
tasks
greatly</p>
      <p>MeW-HTML</p>
      <p>This section describes the meta-HTML used in the
example case. OtherHTMI extensions are possible~In
fact this possibility is a major plus of the method: it
provides applicabilityto a wide range of possible web
services, through case-by-case adaptationof the meta
documentary language. It can even go beyond XML
eventually,but thatis anotherstory.</p>
      <p>3.1 Inpnt field types</p>
      <p>The names of the form/input fields are extended
with a type suffix of the form : t::, where t is a single
letter as describedin Table 2.</p>
      <p>descrfpJGunoon
integer
alphanumeric
subject to
verification
special</p>
      <p>The upper case versions of t CI, A, E) additionally
requirea non-null value. The set is easily extendedwith
more basic types, e.g"float and date. Type e (from the
Portuguese word especI) requires a case-by-case
treatmentin the main procedure.The relevant section in
the procedure is structuredas a case construct:any e
type value falling back to the others case raises a
System_Error (or something similar). This together
with the proper test data set increases safety in the
developmentstage.</p>
      <p>3.2 Conditiomd indnsion</p>
      <p>Meta-element i f provides conditional selection of
partsof the meta-documentto be included in the served
page. The selected part is the enclosed content of this
element This is similar to the C preprocessordirective
# f . The condition is expressed in the element
attributes</p>
      <p>Namel -[Valuel~</p>
      <p>] ... [VaZuel [
Name -|Value
|-..|Value
[
which contain references to form/input element
namesand values. The set of attributesis a conjunction
of disjunctions. The (positive) Boolean value of the set
determinesinclusion of the element content. Figure 3
shows an excerpt of the example meta-document with
heavy use of conditional inclusion, and Figure4 shows
the correspondingHTMLresult for a particularsession.</p>
      <p>Note the pervasive use of E suffixes in the meta"
text: this was very helpful in assuring completeness of
treatmentof all cases-and thereforethe correctnessof
the service.</p>
      <p>33 Session control</p>
      <p>A special hidden input element named
_Seguinte : e (Portuguese for next) specifies the
next meta-HTML unit to be processed. This in
nontrivial at the start of the session, when moving from an
authenticationform to the main set"</p>
      <p>Also, the absence of this element may be used to
signal to the main procedure that the session is in its
final step, usually submission of the combined data of
all forms.</p>
      <p>A small number (circa five) of other special
elements were found necessary co control very specific
aspects of the service. It was technically easy to
implementthem in the same vein notably with the CGI
and XML processing resources alreadyavailable.
4. The tools and components</p>
      <p>To see the next transactional "transfer"happen,
ignore the XML (and SOAP) hype and watchfor actual
XML implementations.
(Mike Radow,in [3])</p>
      <p>A modified version of package CGI by David
Wheeler served well as the CGJ component~ The
modifications, done by myself, included:
. EJirmnation of auxiliary overloading which
caused ambiguity problems to the GNAT
compiler. I suspect GNATs complaints were
legitimate, language-wise; perhaps Wheeler
used another, non-validated, compiler; or the
problem was not detected until my use of the
package.
. Redesign of the output format of procedure</p>
      <p>Put_Variables.</p>
      <p>The modified version is now in [32" Further
modifications are planned and described there,</p>
      <p>Package XML_Parser by myself, also in [3], was
used to transform the HTML emitted by the
nontechnical staff into extended HTML and then into the
served HTML pages. Although XML_Parser served
well as the (extended) HTML component of the current
project case, it has severe limitations with respect to
XML proper, noticeable in its documentation; it has
also some design drawbacks, viz" the finite state device
is entangled with the rest of the code.</p>
      <p>To overcome these ]imitations, I have aJready
developed a new XMI processing package,
XML_Automaton. This package properly encapsulates
the finite state device. A new XML parser package,
XML_Parser_2, will use XML_Automaton as its
engine, in order to produce a more localised
interpretation of the XML input. XML_Parser_2 is
designed after XML_Parser with respect co the
(internal) treatment of XML element containment, and I
am trying to make the expression of this containment
generic, probably with an array of packages drawing on
XML_Parser_2, each dedicated to a certain
expression:' an Ada linked list Prolog facts, a DOM
structure (Document Object Model, vd. w3.org),etc.</p>
      <p>A rather specific but interesting point is the
character-by-character vs. chunking way of processing
XML input, XML elements may span over more than
one text line. In chunk-based parsers, the chunk is
normally the line, These parsers, especia/Jy if a/so based
on character string pattern matching libraries, have a
real problem here. XML_Automaton does not.</p>
      <p>XML_Parser_2 design includes an unbounded
array of stacks. Currently I am choosing between two
bases for the implementation of this structure:
GNAT.Table or Unbounded_Array. 1 am inclined to
the latter because it is compiler-independent"</p>
      <p>Evalnation and some remarks
ate..Thesoftware metrics available for the example case</p>
      <p>Note the cost. We are missing precise comparison
data with other experiments, but our experience and
intuition tells us that it is a very good number-given
the degree of correctness attained in the final service;
notably, no fatal defaults were found. I have worked
also recently with a team developing a service similar to
the example in intrinsic complexity but with much less
form data, implemented with inter-calling PERL
(www.perl.corn/f)uhsc)ripts (essentially a Great Ball of
Mud, vd. slashdot.org/articles/00/04/29/092624I.shcrnl)-it
required much more work and delivered much less
correctness. The service is still plagued with detected
bugs thatno one rectifies anymore.</p>
      <p>Why not use PHP (www.php.net)?Our reasons
include:
*
our methodoffers morecontrol over the design
and processing of the meta-language
* PIfP documentationis incomprehensible
. Why notuse MawI[2]?
* it is not extensible
. it is not rnainmined
* it seems to be very hardto achieve a working
installation</p>
      <p>I am particularlyfond of the inevitable conclusion
thatAda is a good choice for programmingin the small.
So, there is a real small software engineering after all,
and it is not confined to the unadjusted Personal
Sofi"ware Process [4] we read about-but never
practice.</p>
      <p>Acknowledgements
`Iwish to thank my research advisor at CENTRIA5,
Doctor GabrielPereira Lopes. His correctenvisionment
of research in informatics as a rich network of
diversified competencies and interests has made
possible the degree of reusabilityseen here, notably of
the XML tools which were firstly developed for our
research projects in information retrieval and natural
language processing.6 I am also indebted to Professor
Joo Barroso of CEESCOLA for providing such an
interesting case of Internet usage as the one described
here. Thanks to my colleagues Pablo Otero and
Alexandre Agustini, and to the QUATIC'2OOI
reviewers, for their good comments. Thanks to my
family, for letting our home be also a software house.
And to Our Lord, for everything.</p>
      <p>References</p>
      <p>The Web Usage Paradox [webpage}: Why Do People
1] Use SomethingThis Bad?/ JakobNielsen.- Alertboxfor
August 9, 1998.
(http://www.useit.corn/alertbox/980809.html)</p>
      <p>Ad"'lib: the software process and programminglibrary
3} [web site} / by Mo Amado AIves,
(http:!/lexis.di"fct"unl.pt/ADaLIB)</p>
      <p>Results Of applying the personal software process / P.
4] Ferguson ; W. S. Humphrey; S. Khajenoori ; S~Macke ; A.</p>
      <p>Matvya - pp. 24-32 - //In: IEEE Computer,30(5), 1997
-(description apud[5])</p>
      <p>6 PrQjectS Corpora de Portuguas Medieval, PGR,
}GM, and, in greatpart, my post-graduationscholarship
PRAXIS XXI/BM/2O8OO/99,grantedby the FundaV&amp;o
para a`Cilncia e Tecnologia of Portugal.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>