=Paper= {{Paper |id=Vol-1284/paper2 |storemode=property |title=Web Forms and XML Processing: Some Quality Factors of Process and Product |pdfUrl=https://ceur-ws.org/Vol-1284/paper2.pdf |volume=Vol-1284 |dblpUrl=https://dblp.org/rec/conf/quatic/Alves01 }} ==Web Forms and XML Processing: Some Quality Factors of Process and Product== https://ceur-ws.org/Vol-1284/paper2.pdf
     Web Forms and XML Processing: Some QnaRty Factors of
                    Process and Prodnct
                                           Mo Amado Alves2
                   Faculdade de Ci8ncias e Tecnologia da Universidade Nova de Lisboa
                                            maa@di.fct.unI.pt




                         Abstract                             Olympics site was bad. Oracle's site is bad. I submit a
                                                              Law of Inverse Quality: the greater the institution,
                                                              worse the site.
     "The web is bad; really bad."--observed Jakob                I hope this international conference-in particular
Nielsen three years ago about The Web Usage Paradox           panel PNI.2 entitled Qualidade nos Sistemas de
 [1]. The true paradox today is that science and              InformaVdo da Administrado PAbIica.. o Inlc!o duma
technology institutions have bad sites! Programrnes           Cruda (Qualz'ryin Public Administration Information
 devoted to the information socioty itself have bad sites!!   Sysrems..the Start of a Quest), in regard to institutional
Hopefully this international conference will make a           sites--win make a difference. At least it will show
 difference. At least show Nielsen and myself are not         Nielsen and myself are not simply fools on the hill"-
just fools on the hill" and, hopefully, help management       not anymore. And, hopefully, it will help educated
make wiser decisions regarding web development                management make wiser decisions regarding web
 strategies and, correlatively, better technical staff        development strategies-and,        correlatively, better
selection. The current paper contributesto this goal by       technical staff selection.
exposing a method for the development of complex web              The current paper contributes to this honourable
services which embodies a numberof quality assurance          goal by exposing a method for the development of
items and has passed the test of real web service             complex web services which embodies a number of
 deployment, featuring user authentication, multiple          quality assuranceitems and has passed the test of real
 forms, recorded data, and automatedpage creation. The        service case deployment.
 method and the case are described with incursions into           Is the Web really worse today, rather than three
selected technical details, Quality factors are explicitly    years ago? Yes, definitely. Here is a recent (2000)
 or implicitly associated with each described item.           observation from the same author of [1]: "If you are
                                                              going to go and buy something on a new website, you
1. Web QuaKty Mauifesto                                       will fail. If you go to a new website, you will not be
          The web is bad; really bad. [1]                     able                to              use                it.
                                                              (http://www.wired.com/newsibusiness/O,1367,40155,00
     And it is getting worse. Already three years have        ,html)
passed since web usage specialist Jakob Nielsens                  Web quality is a twofold problem: technical and
article [I] has appeared, and still his observations are      social.
right on the mark:"90% of all commercial websites are             Technical A veritable plethora of techniques and
overly difficult to use due to b/oated page design that       methods exists today to develop web services. Judging
takes forever to download, internally focused design          from the results, most of them are bad. The current
that hypes products without giving real info, obscure         paper presents a method that emphasises some quality
site structures, Lack of navigation support, narrative        factors of process and product. These factors are
writing style optimised for print, not for the way users      explicitly or implicitly associated with each described
read online, etc." ([I] abridged, original emphasis           technical item. Bottom line: it is a good method. It is
maintained).                                                  proven. The rest of this paper will deal with the
     In the currentpaper I add a couple of items to this      technical aspect only.
list.                                                             Social. The social problem is to convince people to
     Why is it even worse, today? Well, for one,              use good methods and techniques. To be quality-aware.
institutional sites are bad. In fact, the true paradox        People like Jakob Nielsen [I] is trying to pass the
today is that scz.ence and technology institutions have       message for some years now. The message is simple:
bad sites! Research prograrnrnes devoted to the               User: demand web quality Web service provider,
information society itself have bad sites!! The 2000

   z My research is supportedby the FundaVaopara a Cincia e Tecnologia, vd~Acknowledgements.
provz"dequaLz.ty (or eLsedz.e).But seemingly the word is          The method is continually evolving, due to both
not getting through. Users are not demanding. Perhaps          external technological change and internal planned
they simply do not know the Web could be much better.         increments.Some of these planned increments are also
Perhaps they simply do not want to: one way for a site        exposed in this paper, as a means of obtaining feedback
to be better is to be simpler; perhaps most users prefer       from the software engineering community. This traitin
complicated, slow sites. This social aspect is not             particularputs the method on the top level of the CMM
addressedfurd1erin the currentpaper.                           (Capability Maturity Model, vd. httfi://www.sei.cmu.edu).
                                                               Other well known software process references
2. Qullity Faaors Overview                                     associated with the current method are vanilla
                                                              frameworks,         extreme      programming,    futurisz
              HTTP, CGI are the GOTOs of the 1990s.           programmz"ng(vd. Intemet). The precise form of these
           [21                                                 associations is left implicit in the paper.

    We present a method for the development of                    1. The Case
complex web services. The method was tested with a
service case featuri02:                                           The most recent application of the method was in
         .         user authentication                        the implementationof an official inquiry to schools via
         .         multiple forms                             Internet. This was in Portugal, in the year 2000. The
         .         recorded data                              purpose of the inquiry was to evaluate a recent reform
         .         automated page creation                    in school administration.The inquirer, and my client,
    The method emphasises quality at two stages:              was CEESCOLA3, a state-funded research centre in
development and execution (meaning runtime execution          education-henceforth simply the Centre.
of the service). It does this by scoring high on quality          The inquirees were 350 secondary schools and
factors of process and product respectively; mostly of        school groups randomly chosen out of a nation-wide
product, but high scores here are justified by process        universe of 1472 such entities.
factors implicit in the method, as illustrated in Table I .       The service was required to be accessible only by
                                                              the selected schools, so these were previously given, via
      Factor                     Justification                surface mail, private access elements (identifier and
   Correctn           High    traceability: rich              password). A time window for answering the inquiry
                  messages. Operationalised complet           was fixed, and the system was required to make the
                  checks.                                     answers available to the Contra as soon as they were
    ReLiabz-Lz.ty     All   errors    handled.    Standard    submitted.
                  technology. Simple page design.                 The inquiry itseJftook the form of a numberof long
    Maz.ntaz-nab      Good choice of programming              and complex questionnaires: each questionnaire had
z'LitY            language (Ada). Separation of HTML          hundreds of questions, and the answer to certain
                  code and service logic.                     questions determinesthe existence of other questions or
                                                              their domain of possible answers.
                                                                  Note that this case is very similar to electronic
    The productalso scores high on eciency, usability,        commerceservices in complexity and safety issues.
portability, and interoperabz.Iz.ty.It scores less on
testabz.Ltly(test data must be preparedfor each case, and         2. The Method
it is not operationalised), and incegn`ty (no access
control tool). These scores and their justification are          The top-level featuresof the method are:
furthersupportedby the items detailed in the rest of the         HTML
paper.                                                           CGI
    The method comprises selected and created "open              separationof HTML code (documents) and service
source" software tools and components: package CGI               logic (program)
by David Wheeler (modified version included in [3]),             HTML extended internally
package XML_Parser by the author{3], and GNAT by                 documentspreparedthrough XML transformations
GNU, NYU and ACT (vd. adapower.com).                             both the service logic and the transformations
    We use the word safety as a synonym of reLiability,          writtenin Ada
and we use the words method and safety in a wide                 session state maintainedin the served pages
sense, viz. with method ranging from architecture to             a single meta HTML unit
coding, and safety including effectiveness and                   a single service procedure
efficiency both in development (cost safety) and
execution.
    In this paperthe method is presented with examples            3 Centro de Estudos da Escola = Center fOr School
from the real development case, and with incursions           Studies, Faculty of Psychology and Education Sciences
into the detail of selected aspects.                          of the University of Lisbon.



   12 / QuaTIC2001
    The separation of HTML code and
service logic is a crucial design premise. Our
rationale for this cOnverges for the most part
with that described in [2]. In order to attain
separation, the stored pages are written in a
slightly extended HTML, call it meta"
HTML, which is transformed by the service,
upon each request, into the served pages in
standard HTML,
    Also, minirnalized HTML was preferred
as a basis for meta-HTML, because
minimalized HTML is more readable by
humans        than     its    non-minimalized
counterpart or XETML-and the ultimate
reviewers of the meta-document are human.
    Now, HTML, minimalized HTML,
XHTML, and the designed meta-HTML are
all subsumed by a slightly relaxed XML, notably one           next as hidden input elements. This provides for:
not requiring pairing end tags. This may seem                     .    data communication between session points, or
nonsensical to XML formalist eyes and sound of heresy                  forms-this implements sessions
to XML purist ears, but in practice such a "dirty"                .    general tests on input values to be run on any
version of XlvIL is very convenient. With a robust                     session point-this increases safety
XML processor one can easily control that one dirty               All input values are relayed, so careful naming of
aspect of not requiring pairing end tags. Package             input elements is required (in order to avoid collision).
XML__Parser has such a robustness feature. The gains          The localisation of all forms in a single meta-unit
include:                                                      promotes this.
.     a single processing component for all "dirty"               The method evidently relies on the usual external
      XML instances (HTML, minimalized HTML,                  pieces of web technology: an HTTP/CGI server and
      meta-HTML, XHTML)                                       web browsers. The service was deployed with the
.     increased readability of the input units                Apache server running on a Linux system. Some
.     an easy path to proper XML representations (not         Problems were felt here, notably an access security
      taken, but the current trend from HTML towards          hole: the service internal database files, in order to be
      xMI- in the Web was a concern)                          accessible by the main procedure, had to be configured
    So, in this paper, we take the liberty of calling         in such a way that they were also accessib2e by all 20ca2
simply XML to all that-and hence the pervasive use of         Linux system users! This problem is perhaps corrigible
the term and inclusion of XML tools in the method.            with the Proper Apache settings; but this servers
    XML processing happens at two stages: data                documentation is hardly comprehensible.
preparation and service execution"
    Data preparation. The questiOnnaires are created          4.1 The service procedure
by client staff using WYSIWYG editors like Microsoft               The service procedure is a non-reactive program, i,e.
FrontPage and Word. Then these items are transformed          it terminates, a usual CGI procedures are. It is designed
into the final, static meta-HTML items. The major part        as the sequence of blocks sketched in Figure I .
of this transformation is automated, by means of Ada               The computation is data-driven by form input
procedures utilising package XML_Parser. The                  values, meta-HTML markers, and system database files
transformation consists of:                                   (users, passwords, etc,) The form input values and the
    .    rectify the messy HTML emitted by Microsoft          files are totally case-dependent, so we focus on the
         tools                                                meta-HTML markers, and dedicate the next section to
    .    rectify and insert control elements and attributes   them.
    .    structure the items into identified groups                The exception handling is crucial. All errors are
    Because the necessary ad hoc transformation               captured in a report Page served to the user with a
programs are small (c. Ik lines), and the compiler is fast    wealth of useful information, including instructions for
and easily installable on any site, Ada can also be used      error recovery, illustrated in Figure 2.4 This happens
here, instead of the usual unsafe scripting languages.
    Service execu6on. The Pages are not served                     4 The Original data in portuguese are shown in the
directly: they have a number of markers that must be          fl2ures because thev have formal indentifiers         in
re.p2aced by the definitive va2ues. his is done at run-       port-uguese (sOmetimes" in English, e.g when they
five by the main service Procedure, agam utxlxsmg             emanat-e from the compiler), an-d we wahted to ensur
XML-Parser. The rest of this section focuses on this.         referenctial consistency between all data items shown in
    Input values from one form are relayed onto the           this paper and at its presentations.




                                                                                                    QuaTIC'2001 / 13
                                                                System_Error (or something similar). This together
                                                                with the proper test data set increases safety in the
                                                                development stage.

                                                                    3.2 Conditiomd indnsion

                                                                   Meta-element i f provides conditional selection of
                                                                parts of the meta-documentto be included in the served
                                                                page. The selected part is the enclosed content of this
                                                                element This is similar to the C preprocessor directive
                                                                # f . The condition is expressed in the element
                                                                attributes
                                                                           Namel -[Valuel~ ] ... [VaZuel [


                                                                          Name -|Value          |-..|Value      [
                                                                    which contain references to form/input element
                                                                names and values. The set of attributesis a conjunction
even during development       and testing, facilitating these
                                                                of disjunctions. The (positive) Boolean value of the set
tasks greatly-
                                                                determines inclusion of the element content. Figure 3
                                                                shows an excerpt of the example meta-document with
    3. MeW-HTML
                                                                heavy use of conditional inclusion, and Figure 4 shows
                                                                the correspondingHTML result for a particularsession.
   This section describes the meta-HTML used in the
                                                                    Note the pervasive use of E suffixes in the meta"
example case. Other HTMI extensions are possible~In
                                                                text: this was very helpful in assuring completeness of
fact this possibility is a major plus of the method: it
                                                                treatmentof all cases-and therefore the correctness of
provides applicability to a wide range of possible web
                                                                the service.
services, through case-by-case adaptationof the meta
documentary language. It can even go beyond XML
eventually, but that is anotherstory.                               33 Session control

                                                                    A special hidden input element named
    3.1 Inpnt field types
                                                                _Seguinte       :e   (Portuguese for next) specifies the
    The names of the form/input fields are extended             next meta-HTML unit to be processed. This in non-
                                                                trivial at the start of the session, when moving from an
with a type suffix of the form : t::, where t is a single
                                                                authenticationform to the main set"
letter as describedin Table 2.
                                                                    Also, the absence of this element may be used to
                                                                signal to the main procedure that the session is in its
                    descrfpGon
                           Juno
                                                                final step, usually submission of the combined data of
                    integer
                                                                all forms.
                    alphanumeric                                    A small number (circa five) of other special
                    subject    to          special              elements were found necessary co control very specific
                 verification                                   aspects of the service. It was technically easy to
                                                                implement them in the same vein notably with the CGI
                                                                and XML processing resources already available.
    The upper case versions of t CI, A, E) additionally
require a non-null value. The set is easily extended with           4. The tools and components
more basic types, e.g" float and date. Type e (from the
Portuguese word especI) requires a case-by-case                    To see the next transactional "transfer"happen,
treatmentin the main procedure. The relevant section in         ignore the XML (and SOAP) hype and watchfor actual
the procedure is structuredas a case construct: any e           XML                               implementations.
type value falling back to the others case raises a             (Mike Radow, in [3])




  14 / QuaTIC2001
    A modified version of package CGI by David
Wheeler served well as the CGJ component~ The
modifications, done by myself, included:
    .    EJirmnation of auxiliary overloading which
         caused ambiguity problems to the GNAT
         compiler. I suspect GNATs complaints were
         legitimate, language-wise; perhaps Wheeler
         used another, non-validated, compiler; or the
         problem was not detected until my use of the
         package.
    .    Redesign of the output format of procedure
         Put_Variables.
    The modified version is now in [32" Further
modifications are planned and described there,
    Package XML_Parser by myself, also in [3], was
used to transform the HTML emitted by the non-
technical staff into extended HTML and then into the
served HTML pages. Although XML_Parser served
well as the (extended) HTML component of the current
project case, it has severe limitations with respect to
XML proper, noticeable in its documentation; it has
also some design drawbacks, viz" the finite state device
is entangled with the rest of the code.
    To overcome these ]imitations, I have aJready
developed a new XMI                 processing package,
XML_Automaton. This package properly encapsulates
the finite state device. A new XML parser package,
XML_Parser_2, will use XML_Automaton as its
engine, in order to produce a more localised
interpretation of the XML input. XML_Parser_2 is
designed after XML_Parser with respect co the
(internal) treatment of XML element containment, and I
am trying to make the expression of this containment
generic, probably with an array of packages drawing on
XML_Parser_2, each dedicated to a certain
expression:' an Ada linked list Prolog facts, a DOM
structure (Document Object Model, vd. w3.org),etc.
    A rather specific but interesting point is the
character-by-character vs. chunking way of processing
XML input, XML elements may span over more than
one text line. In chunk-based parsers, the chunk is
normally the line, These parsers, especia/Jy if a/so based
on character string pattern matching libraries, have a
real problem here. XML_Automaton does not.
    XML_Parser_2 design includes an unbounded
array of stacks. Currently I am choosing between two
bases for the implementation of this structure:
GNAT.Table or Unbounded_Array. 1 am inclined to
the latter because it is compiler-independent"




                                                             QuaTIC'2001/ 15
                                                              research projects in information retrieval and natural
    5.   Evalnation and some remarks                          language processing.6 I am also indebted to Professor
                                                              Joo Barroso of CEESCOLA for providing such an
ate..Thesoftware metrics available for the example case       interesting case of Internet usage as the one described
                                                              here. Thanks to my colleagues Pablo Otero and
                                                              Alexandre Agustini, and to the QUATIC'2OOI
                                                              reviewers, for their good comments. Thanks to my
                                                              family, for letting our home be also a software house.
                                                              And to Our Lord, for everything.
                                                                                      References
                                                                      The Web Usage Paradox [webpage}: Why Do People
    Note the cost. We are missing precise comparison          1] Use SomethingThis Bad? / Jakob Nielsen.- Alertbox for
data with other experiments, but our experience and               August              9,            1998.            -
intuition tells us that it is a very good number-given            (http://www.useit.corn/alertbox/980809.html)
the degree of correctness attained in the final service;
notably, no fatal defaults were found. I have worked                 Maw} : A Domain-Specific Language for Form.Based
also recently with a team developing a service similar to     2] Services / David L~ Atkins ; Thomas Ball ; Glenn Emus ;
the example in intrinsic complexity but with much less           Kenneth Cox. - pp. 334 346 - //In: IEEE Transactions on
form data, implemented with inter-calling PERL                   Software Engineering, vol. 25, no. 3, May/June 1999
(www.perl.corn/f)uh)   scripts (essentially a Great Ball of          Ad"'lib: the software process and programming library
Mud, vd. slashdot.org/articles/00/04/29/092624I.shcrnl)-it    3} [web site} /          by Mo         Amado AIves, -
required much more work and delivered much less                  (http:!/lexis.di"fct"unl.pt/ADaLIB)
correctness. The service is still plagued with detected
bugs that no one rectifies anymore.                                  Results Of applying the personal software process / P.
    Why not use PHP (www.php.net)?Our reasons                 4] Ferguson ; W. S. Humphrey; S. Khajenoori ; S~ Macke ; A.
include:                                                         Matvya - pp. 24-32 - //In: IEEE Computer,30(5), 1997
                                                                 -(description apud [5])
     * our method offers more control over the design
          and processing of the meta-language                        Software Engineering :' An Engineering Approach /
     * PIfP documentationis incomprehensible                  5] James F. Peters ; Witold Pedrycz.- John Wiley & Sons,
     .    Why notuse MawI [2]?                                   Inc. : New York, 2000. - xviii, 702 p.
     * it is not extensible
     .    it is not rnainmined
     * it seems to be very hard to achieve a working
          installation
   I am particularlyfond of the inevitable conclusion
that Ada is a good choice for programmingin the small.
So, there is a real small software engineering after all,
and it is not confined to the unadjusted Personal
Sofi"ware Process [4] we read about-but never
practice.
                     Acknowledgements
   `I wish to thank my research advisor at CENTRIA5,
Doctor GabrielPereira Lopes. His correct envisionment
of research in informatics as a rich network of
diversified competencies and interests has made
possible the degree of reusability seen here, notably of
the XML tools which were firstly developed for our
                                                                 6 PrQjectS Corpora de Portuguas Medieval,    PGR,
                                                              }GM, and, in great part, my post-graduationscholarship
   $ Centre for Artificial Intelligence, Universidade         PRAXIS XXI/BM/2O8OO/99,granted by the FundaV&o
Nova de Lisboa.                                               para a`Cilncia e Tecnologia of Portugal.



  16 / QuaTIC2001