=Paper=
{{Paper
|id=Vol-1284/paper2
|storemode=property
|title=Web Forms and XML Processing: Some Quality Factors of Process and Product
|pdfUrl=https://ceur-ws.org/Vol-1284/paper2.pdf
|volume=Vol-1284
|dblpUrl=https://dblp.org/rec/conf/quatic/Alves01
}}
==Web Forms and XML Processing: Some Quality Factors of Process and Product==
Web Forms and XML Processing: Some QnaRty Factors of Process and Prodnct Mo Amado Alves2 Faculdade de Ci8ncias e Tecnologia da Universidade Nova de Lisboa maa@di.fct.unI.pt Abstract Olympics site was bad. Oracle's site is bad. I submit a Law of Inverse Quality: the greater the institution, worse the site. "The web is bad; really bad."--observed Jakob I hope this international conference-in particular Nielsen three years ago about The Web Usage Paradox panel PNI.2 entitled Qualidade nos Sistemas de [1]. The true paradox today is that science and InformaVdo da Administrado PAbIica.. o Inlc!o duma technology institutions have bad sites! Programrnes Cruda (Qualz'ryin Public Administration Information devoted to the information socioty itself have bad sites!! Sysrems..the Start of a Quest), in regard to institutional Hopefully this international conference will make a sites--win make a difference. At least it will show difference. At least show Nielsen and myself are not Nielsen and myself are not simply fools on the hill"- just fools on the hill" and, hopefully, help management not anymore. And, hopefully, it will help educated make wiser decisions regarding web development management make wiser decisions regarding web strategies and, correlatively, better technical staff development strategies-and, correlatively, better selection. The current paper contributesto this goal by technical staff selection. exposing a method for the development of complex web The current paper contributes to this honourable services which embodies a numberof quality assurance goal by exposing a method for the development of items and has passed the test of real web service complex web services which embodies a number of deployment, featuring user authentication, multiple quality assuranceitems and has passed the test of real forms, recorded data, and automatedpage creation. The service case deployment. method and the case are described with incursions into Is the Web really worse today, rather than three selected technical details, Quality factors are explicitly years ago? Yes, definitely. Here is a recent (2000) or implicitly associated with each described item. observation from the same author of [1]: "If you are going to go and buy something on a new website, you 1. Web QuaKty Mauifesto will fail. If you go to a new website, you will not be The web is bad; really bad. [1] able to use it. (http://www.wired.com/newsibusiness/O,1367,40155,00 And it is getting worse. Already three years have ,html) passed since web usage specialist Jakob Nielsens Web quality is a twofold problem: technical and article [I] has appeared, and still his observations are social. right on the mark:"90% of all commercial websites are Technical A veritable plethora of techniques and overly difficult to use due to b/oated page design that methods exists today to develop web services. Judging takes forever to download, internally focused design from the results, most of them are bad. The current that hypes products without giving real info, obscure paper presents a method that emphasises some quality site structures, Lack of navigation support, narrative factors of process and product. These factors are writing style optimised for print, not for the way users explicitly or implicitly associated with each described read online, etc." ([I] abridged, original emphasis technical item. Bottom line: it is a good method. It is maintained). proven. The rest of this paper will deal with the In the currentpaper I add a couple of items to this technical aspect only. list. Social. The social problem is to convince people to Why is it even worse, today? Well, for one, use good methods and techniques. To be quality-aware. institutional sites are bad. In fact, the true paradox People like Jakob Nielsen [I] is trying to pass the today is that scz.ence and technology institutions have message for some years now. The message is simple: bad sites! Research prograrnrnes devoted to the User: demand web quality Web service provider, information society itself have bad sites!! The 2000 z My research is supportedby the FundaVaopara a Cincia e Tecnologia, vd~Acknowledgements. provz"dequaLz.ty (or eLsedz.e).But seemingly the word is The method is continually evolving, due to both not getting through. Users are not demanding. Perhaps external technological change and internal planned they simply do not know the Web could be much better. increments.Some of these planned increments are also Perhaps they simply do not want to: one way for a site exposed in this paper, as a means of obtaining feedback to be better is to be simpler; perhaps most users prefer from the software engineering community. This traitin complicated, slow sites. This social aspect is not particularputs the method on the top level of the CMM addressedfurd1erin the currentpaper. (Capability Maturity Model, vd. httfi://www.sei.cmu.edu). Other well known software process references 2. Qullity Faaors Overview associated with the current method are vanilla frameworks, extreme programming, futurisz HTTP, CGI are the GOTOs of the 1990s. programmz"ng(vd. Intemet). The precise form of these [21 associations is left implicit in the paper. We present a method for the development of 1. The Case complex web services. The method was tested with a service case featuri02: The most recent application of the method was in . user authentication the implementationof an official inquiry to schools via . multiple forms Internet. This was in Portugal, in the year 2000. The . recorded data purpose of the inquiry was to evaluate a recent reform . automated page creation in school administration.The inquirer, and my client, The method emphasises quality at two stages: was CEESCOLA3, a state-funded research centre in development and execution (meaning runtime execution education-henceforth simply the Centre. of the service). It does this by scoring high on quality The inquirees were 350 secondary schools and factors of process and product respectively; mostly of school groups randomly chosen out of a nation-wide product, but high scores here are justified by process universe of 1472 such entities. factors implicit in the method, as illustrated in Table I . The service was required to be accessible only by the selected schools, so these were previously given, via Factor Justification surface mail, private access elements (identifier and Correctn High traceability: rich password). A time window for answering the inquiry messages. Operationalised complet was fixed, and the system was required to make the checks. answers available to the Contra as soon as they were ReLiabz-Lz.ty All errors handled. Standard submitted. technology. Simple page design. The inquiry itseJftook the form of a numberof long Maz.ntaz-nab Good choice of programming and complex questionnaires: each questionnaire had z'LitY language (Ada). Separation of HTML hundreds of questions, and the answer to certain code and service logic. questions determinesthe existence of other questions or their domain of possible answers. Note that this case is very similar to electronic The productalso scores high on eciency, usability, commerceservices in complexity and safety issues. portability, and interoperabz.Iz.ty.It scores less on testabz.Ltly(test data must be preparedfor each case, and 2. The Method it is not operationalised), and incegn`ty (no access control tool). These scores and their justification are The top-level featuresof the method are: furthersupportedby the items detailed in the rest of the HTML paper. CGI The method comprises selected and created "open separationof HTML code (documents) and service source" software tools and components: package CGI logic (program) by David Wheeler (modified version included in [3]), HTML extended internally package XML_Parser by the author{3], and GNAT by documentspreparedthrough XML transformations GNU, NYU and ACT (vd. adapower.com). both the service logic and the transformations We use the word safety as a synonym of reLiability, writtenin Ada and we use the words method and safety in a wide session state maintainedin the served pages sense, viz. with method ranging from architecture to a single meta HTML unit coding, and safety including effectiveness and a single service procedure efficiency both in development (cost safety) and execution. In this paperthe method is presented with examples 3 Centro de Estudos da Escola = Center fOr School from the real development case, and with incursions Studies, Faculty of Psychology and Education Sciences into the detail of selected aspects. of the University of Lisbon. 12 / QuaTIC2001 The separation of HTML code and service logic is a crucial design premise. Our rationale for this cOnverges for the most part with that described in [2]. In order to attain separation, the stored pages are written in a slightly extended HTML, call it meta" HTML, which is transformed by the service, upon each request, into the served pages in standard HTML, Also, minirnalized HTML was preferred as a basis for meta-HTML, because minimalized HTML is more readable by humans than its non-minimalized counterpart or XETML-and the ultimate reviewers of the meta-document are human. Now, HTML, minimalized HTML, XHTML, and the designed meta-HTML are all subsumed by a slightly relaxed XML, notably one next as hidden input elements. This provides for: not requiring pairing end tags. This may seem . data communication between session points, or nonsensical to XML formalist eyes and sound of heresy forms-this implements sessions to XML purist ears, but in practice such a "dirty" . general tests on input values to be run on any version of XlvIL is very convenient. With a robust session point-this increases safety XML processor one can easily control that one dirty All input values are relayed, so careful naming of aspect of not requiring pairing end tags. Package input elements is required (in order to avoid collision). XML__Parser has such a robustness feature. The gains The localisation of all forms in a single meta-unit include: promotes this. . a single processing component for all "dirty" The method evidently relies on the usual external XML instances (HTML, minimalized HTML, pieces of web technology: an HTTP/CGI server and meta-HTML, XHTML) web browsers. The service was deployed with the . increased readability of the input units Apache server running on a Linux system. Some . an easy path to proper XML representations (not Problems were felt here, notably an access security taken, but the current trend from HTML towards hole: the service internal database files, in order to be xMI- in the Web was a concern) accessible by the main procedure, had to be configured So, in this paper, we take the liberty of calling in such a way that they were also accessib2e by all 20ca2 simply XML to all that-and hence the pervasive use of Linux system users! This problem is perhaps corrigible the term and inclusion of XML tools in the method. with the Proper Apache settings; but this servers XML processing happens at two stages: data documentation is hardly comprehensible. preparation and service execution" Data preparation. The questiOnnaires are created 4.1 The service procedure by client staff using WYSIWYG editors like Microsoft The service procedure is a non-reactive program, i,e. FrontPage and Word. Then these items are transformed it terminates, a usual CGI procedures are. It is designed into the final, static meta-HTML items. The major part as the sequence of blocks sketched in Figure I . of this transformation is automated, by means of Ada The computation is data-driven by form input procedures utilising package XML_Parser. The values, meta-HTML markers, and system database files transformation consists of: (users, passwords, etc,) The form input values and the . rectify the messy HTML emitted by Microsoft files are totally case-dependent, so we focus on the tools meta-HTML markers, and dedicate the next section to . rectify and insert control elements and attributes them. . structure the items into identified groups The exception handling is crucial. All errors are Because the necessary ad hoc transformation captured in a report Page served to the user with a programs are small (c. Ik lines), and the compiler is fast wealth of useful information, including instructions for and easily installable on any site, Ada can also be used error recovery, illustrated in Figure 2.4 This happens here, instead of the usual unsafe scripting languages. Service execu6on. The Pages are not served 4 The Original data in portuguese are shown in the directly: they have a number of markers that must be fl2ures because thev have formal indentifiers in re.p2aced by the definitive va2ues. his is done at run- port-uguese (sOmetimes" in English, e.g when they five by the main service Procedure, agam utxlxsmg emanat-e from the compiler), an-d we wahted to ensur XML-Parser. The rest of this section focuses on this. referenctial consistency between all data items shown in Input values from one form are relayed onto the this paper and at its presentations. QuaTIC'2001 / 13 System_Error (or something similar). This together with the proper test data set increases safety in the development stage. 3.2 Conditiomd indnsion Meta-element i f provides conditional selection of parts of the meta-documentto be included in the served page. The selected part is the enclosed content of this element This is similar to the C preprocessor directive # f . The condition is expressed in the element attributes Namel -[Valuel~ ] ... [VaZuel [ Name -|Value |-..|Value [ which contain references to form/input element names and values. The set of attributesis a conjunction even during development and testing, facilitating these of disjunctions. The (positive) Boolean value of the set tasks greatly- determines inclusion of the element content. Figure 3 shows an excerpt of the example meta-document with 3. MeW-HTML heavy use of conditional inclusion, and Figure 4 shows the correspondingHTML result for a particularsession. This section describes the meta-HTML used in the Note the pervasive use of E suffixes in the meta" example case. Other HTMI extensions are possible~In text: this was very helpful in assuring completeness of fact this possibility is a major plus of the method: it treatmentof all cases-and therefore the correctness of provides applicability to a wide range of possible web the service. services, through case-by-case adaptationof the meta documentary language. It can even go beyond XML eventually, but that is anotherstory. 33 Session control A special hidden input element named 3.1 Inpnt field types _Seguinte :e (Portuguese for next) specifies the The names of the form/input fields are extended next meta-HTML unit to be processed. This in non- trivial at the start of the session, when moving from an with a type suffix of the form : t::, where t is a single authenticationform to the main set" letter as describedin Table 2. Also, the absence of this element may be used to signal to the main procedure that the session is in its descrfpGon Juno final step, usually submission of the combined data of integer all forms. alphanumeric A small number (circa five) of other special subject to special elements were found necessary co control very specific verification aspects of the service. It was technically easy to implement them in the same vein notably with the CGI and XML processing resources already available. The upper case versions of t CI, A, E) additionally require a non-null value. The set is easily extended with 4. The tools and components more basic types, e.g" float and date. Type e (from the Portuguese word especI) requires a case-by-case To see the next transactional "transfer"happen, treatmentin the main procedure. The relevant section in ignore the XML (and SOAP) hype and watchfor actual the procedure is structuredas a case construct: any e XML implementations. type value falling back to the others case raises a (Mike Radow, in [3]) 14 / QuaTIC2001 A modified version of package CGI by David Wheeler served well as the CGJ component~ The modifications, done by myself, included: . EJirmnation of auxiliary overloading which caused ambiguity problems to the GNAT compiler. I suspect GNATs complaints were legitimate, language-wise; perhaps Wheeler used another, non-validated, compiler; or the problem was not detected until my use of the package. . Redesign of the output format of procedure Put_Variables. The modified version is now in [32" Further modifications are planned and described there, Package XML_Parser by myself, also in [3], was used to transform the HTML emitted by the non- technical staff into extended HTML and then into the served HTML pages. Although XML_Parser served well as the (extended) HTML component of the current project case, it has severe limitations with respect to XML proper, noticeable in its documentation; it has also some design drawbacks, viz" the finite state device is entangled with the rest of the code. To overcome these ]imitations, I have aJready developed a new XMI processing package, XML_Automaton. This package properly encapsulates the finite state device. A new XML parser package, XML_Parser_2, will use XML_Automaton as its engine, in order to produce a more localised interpretation of the XML input. XML_Parser_2 is designed after XML_Parser with respect co the (internal) treatment of XML element containment, and I am trying to make the expression of this containment generic, probably with an array of packages drawing on XML_Parser_2, each dedicated to a certain expression:' an Ada linked list Prolog facts, a DOM structure (Document Object Model, vd. w3.org),etc. A rather specific but interesting point is the character-by-character vs. chunking way of processing XML input, XML elements may span over more than one text line. In chunk-based parsers, the chunk is normally the line, These parsers, especia/Jy if a/so based on character string pattern matching libraries, have a real problem here. XML_Automaton does not. XML_Parser_2 design includes an unbounded array of stacks. Currently I am choosing between two bases for the implementation of this structure: GNAT.Table or Unbounded_Array. 1 am inclined to the latter because it is compiler-independent" QuaTIC'2001/ 15 research projects in information retrieval and natural 5. Evalnation and some remarks language processing.6 I am also indebted to Professor Joo Barroso of CEESCOLA for providing such an ate..Thesoftware metrics available for the example case interesting case of Internet usage as the one described here. Thanks to my colleagues Pablo Otero and Alexandre Agustini, and to the QUATIC'2OOI reviewers, for their good comments. Thanks to my family, for letting our home be also a software house. And to Our Lord, for everything. References The Web Usage Paradox [webpage}: Why Do People Note the cost. We are missing precise comparison 1] Use SomethingThis Bad? / Jakob Nielsen.- Alertbox for data with other experiments, but our experience and August 9, 1998. - intuition tells us that it is a very good number-given (http://www.useit.corn/alertbox/980809.html) the degree of correctness attained in the final service; notably, no fatal defaults were found. I have worked Maw} : A Domain-Specific Language for Form.Based also recently with a team developing a service similar to 2] Services / David L~ Atkins ; Thomas Ball ; Glenn Emus ; the example in intrinsic complexity but with much less Kenneth Cox. - pp. 334 346 - //In: IEEE Transactions on form data, implemented with inter-calling PERL Software Engineering, vol. 25, no. 3, May/June 1999 (www.perl.corn/f)uh) scripts (essentially a Great Ball of Ad"'lib: the software process and programming library Mud, vd. slashdot.org/articles/00/04/29/092624I.shcrnl)-it 3} [web site} / by Mo Amado AIves, - required much more work and delivered much less (http:!/lexis.di"fct"unl.pt/ADaLIB) correctness. The service is still plagued with detected bugs that no one rectifies anymore. Results Of applying the personal software process / P. Why not use PHP (www.php.net)?Our reasons 4] Ferguson ; W. S. Humphrey; S. Khajenoori ; S~ Macke ; A. include: Matvya - pp. 24-32 - //In: IEEE Computer,30(5), 1997 -(description apud [5]) * our method offers more control over the design and processing of the meta-language Software Engineering :' An Engineering Approach / * PIfP documentationis incomprehensible 5] James F. Peters ; Witold Pedrycz.- John Wiley & Sons, . Why notuse MawI [2]? Inc. : New York, 2000. - xviii, 702 p. * it is not extensible . it is not rnainmined * it seems to be very hard to achieve a working installation I am particularlyfond of the inevitable conclusion that Ada is a good choice for programmingin the small. So, there is a real small software engineering after all, and it is not confined to the unadjusted Personal Sofi"ware Process [4] we read about-but never practice. Acknowledgements `I wish to thank my research advisor at CENTRIA5, Doctor GabrielPereira Lopes. His correct envisionment of research in informatics as a rich network of diversified competencies and interests has made possible the degree of reusability seen here, notably of the XML tools which were firstly developed for our 6 PrQjectS Corpora de Portuguas Medieval, PGR, }GM, and, in great part, my post-graduationscholarship $ Centre for Artificial Intelligence, Universidade PRAXIS XXI/BM/2O8OO/99,granted by the FundaV&o Nova de Lisboa. para a`Cilncia e Tecnologia of Portugal. 16 / QuaTIC2001