=Paper=
{{Paper
|id=Vol-1178/CLEF2012wn-CLEFeHealth-SuominenEt2012d
|storemode=property
|title=Towards Ease of Building Legos in Assessing eHealth Language Technologies A RESTful Laboratory for Data and Software
|pdfUrl=https://ceur-ws.org/Vol-1178/CLEF2012wn-CLEFeHealth-SuominenEt2012d.pdf
|volume=Vol-1178
}}
==Towards Ease of Building Legos in Assessing eHealth Language Technologies A RESTful Laboratory for Data and Software==
Towards Ease of Building Legos in Assessing eHealth Language Technologies A RESTful Laboratory for Data and Software Hanna Suominen1, 2, Karl Kreiner3, Mike Wu1, Leif Hanlen1, 2 1 NICTA, National ICT Australia, Locked Bag 9013, 1435 Alexandria, NSW, Australia 2 The Australian National University, 0200 Canberra, ACT, Australia 3 AIT, Austrian Institute of Technology GmbH, Reininghausstraße 13/1, 8020 Graz, Austria hanna.suominen@nicta.com.au, karl.kreiner@ait.ac.at, mike.wu@nicta.com.au, leif.hanlen@nicta.com.au Abstract. More and more scientific literature, care guidelines, health records, social media, and other textual eHealth information are electronically available. Language technologies provide a way to analyse these documents for the bene- fit of both individuals and populations. In order to catalyse the development of eHealth language technologies, we propose a virtual laboratory with a standardised platform for easy building and assessment of the systems from the “lego” bricks of shared data, resources, and software. Our aim is to address specific needs in eHealth: governance and shar- ing of private data; provenance and sharing of resources and software; system- atic benchmarking and quality control of systems and their components; and collaboration of eHealth language technology developers and users across healthcare services, academia, industry, and government. The Epicure virtual laboratory is intended to be used for software and re- source evaluation and development as well as for data analysis if data subjects’ privacy is ensured. Epicure is a meta-framework in the sense of abstracting over existing frameworks. Its five roles for clients are data or resource provider, ap- plication assembler, application user, software developer, and sys- tem administrator. We have implemented Epicure based on publicly available software. Its con- trol layer is a Glassfish JavaEE server, providing a RESTful (REpresentational State Transfer) application programming interface; web interface for accessing and installing third-party platforms; and easy operation via standard web com- mands. After proper user authentication and authorisation of incoming requests, it builds applications, analyses data and assesses outcomes by orchestrating storage and execution layers. The storage layer of Epicure uses a CouchDB- based repository for centralised storage of data, resources, and software. It ena- bles controlling document access on the level of documents; tracking all chang- es; recording these revisions; storing all analysis outcomes; and associating the outcomes with the data, resources and software used in their generation. The execution layer of Epicure provides a runtime environment for executing data analysis tasks and installing third party platforms. It invokes tools as simple commands. A tool must be specify its input format, output formats, parameters, and their possible values as a file and be executable on a command line. Tools do not need to be installed within Epicure itself but instead be accessed via a network interface and wrapper, which provides access from Epicure to this re- mote service. Keywords: Evaluation; Health Information Technology; Natural Language Processing; Software Design 1 Introduction More and more scientific literature, care guidelines, health records, social media, and other textual eHealth information is electronically available.1 Language technologies (LTs) provide a way to analyse these documents for the benefit of both individuals and populations. In order to catalyse the development of eHealth LTs, we propose a virtual laboratory with a standardised platform for easy building and assessment of the systems from the “lego” bricks of shared data (e.g., clinical text and annotations), resources (e.g., medical dictionaries and data standards), and software (e.g., proces- sing, evaluation, and visualisation algorithms). 2 Materials and Methods The Epicure virtual laboratory is a meta-framework in the sense of abstracting over existing frameworks (e.g., SNOMED CT Systematized Nomenclature of Medicine Clinical Terms for healthcare terminologies, HL7 Health Level Seven International for interoperability of health information technology, UIMA Unstructured Informa- tion Management Architecture for LTs, and WEKA Waikato Environment for Knowledge Analysis). It is named it after an ancient Greek philosophy, which aims to attain a happy, tranquil and self-sufficient life surrounded by friends; similarly, our aim is collaborative building and systematic assessment of eHealth LTs by easy con- nectivity of data providing, resource and software development, and end use. Epicure has five roles for clients: data or resource provider, application assembler (i.e., build applications from software, resource and data bricks), application user, software developer, and system administrator. It is intended to be used for software and resource evaluation and development as well as for data analysis if data subjects’ privacy is ensured either by limiting data access or conducting appropriate de- identification procedures. Processing examples include: 1. choosing a textual dataset and its annotation with respect to a given classification task from the repository, applying a given medical dictionary to reduce data spar- seness by synonym and hypernym mappings, training a given classification algo- rithm to perform the task automatically, and evaluating this classifier by selecting an evaluation method and measures, and 2. submitting a new classification algorithm; choosing the evaluation methods and methods; selecting a dataset to be used in the evaluation and classification algo- rithms to be compared against; and evaluating the quality of the submitted algo- rithm via these comparisons. Fig. 1 Epicure meta-framework Epicure is implemented based on publicly available software. REpresentational State Transfer (REST) has been chosen because the same design principles have ena- bled the success of the World Wide Web.2 Epicure’s main components are (Figure 1): Control layer is the main communication hub for client interaction. It is a Glassfish JavaEE server, providing a RESTful API (Application Programming Interface); web interface for accessing and installing third-party platforms; and easy operation via the Hyper Text Transfer Protocol (HTTP) commands of GET, POST, PUT, and DELETE for retrieving, creating, replacing, and removing contents. After proper user authenti- cation and authorisation of incoming requests, it builds applications, analyses data and assesses outcomes by orchestrating storage and execution layers. Because many ana- lysis and assessment request take a long time to complete, the control layer must sup- port processing requests asynchronously. Storage layer: A repository, based on CouchDB, is used for centralised storage of data, resources, and software. It treats data, resources, and software equally as docu- ments which enables controlling document access on the level of documents; tracking all changes; recording these revisions; storing all analysis outcomes; and associating the outcomes with the data, resources and software used in their generation. All data to be analysed on Epicure needs to be stored in this repository. Execution layer: A runtime environment is provided for executing data analysis tasks and installing third party platforms for machine learning, natural language pro- cessing (NLP), data formatting, and information visualisation. It invokes tools as simple commands. A tool must be specify its meta information (i.e., input format, output formats, parameters and their possible values) as an XML (Extensible Markup Language) file and be executable on a command line (and not invoke a graphical user-interface). Tools do not need to be installed within Epicure itself but instead be accessed via a network interface and wrapper, which provides access from Epicure to this remote service. Our plan is to improve this layer by a Hadoop-based implementa- tion, which provides a capability to perform parallel computing and integrate software bricks in all programming languages. 3 Results and Discussion We have designed and implemented the Epicure virtual laboratory for integrating and sharing data, resources, and software related to building and assessing eHealth LTs. It addresses specific needs in eHealth: governance and sharing of private data; prove- nance and sharing of resources and software; systematic benchmarking and quality control of systems and their components; and collaboration of eHealth LT developers and users across healthcare services, academia, industry, and government. We have showcased Epicure by building an application for in-hospital surveillance of fungal infections based on patient record texts, expert annotations and authentic outcomes of healthcare.3 Probably the most similar approach is the iDASH NLP Ecosystem (nlp- ecosystem.sdsc.edu), established in 2011. This is a virtual machine with a suite of installed eHealth solutions and capability to download the suite. To differen- tiate this work from Epicure, the catalyst effect of community collaboration and ease of building systems from standardised bricks of data, resources, and software may be challenged without REST. 4 Conclusion Epicure promotes reproducibility and comparability of results; availability of data, resources, software and applications; and renewal of science and healthcare practice. Acknowledgements NICTA is funded by the Australian Government as represented by the Department of Broad- band, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program. This work took place during Karl Kreiner’s research visit at NICTA. References 1. Suominen H and Salakoski T. Supporting communication and decision making in Finnish intensive care with language technology. Journal of Healthcare Engineering 2010 1(4), 595-614. 2. Fielding RT (2000). Architectural Styles and the Design of Network-based Software Archi- tectures; bit.ly/3jFBnu. 3. Martinez D, Suominen H, Ananda-Rajah M, and Cavedon L. Biosurveillance for invasive fungal infections via text mining. Proceedings of the CLEF 2012 Workshop on Cross- Language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis (CLEFeHealth2012), Rome, Italy, 17–20 September 2012.