Fourth International Workshop on Search and Exchange of e-le@rning Materials (SE@M’10) Workshop Proceedings 27-28 September 2010, Barcelona, Spain Copyright © 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors. Editors: David Massart and Elena Shulman, European Schoolnet http://seam.eun.org http://www.aspect-project.org SE@M'10 is partly supported by European Commission through the eContentplus project ASPECT. The work presented on this document is partially supported by the European Community eContentplus programme -project ASPECT: Adopting Standards and Specifications for Educational Content (Grant agreement number ECP-2007-EDU-417008). The interpretations and opinions contained in it are solely those of the authors. It does not represent the opinion of the European Community and the Community is not responsible for any use that might be made of information contained herein. Table of Contents Foreword 1-2 Elena Shulman Repository Services for Outcome-based Learning 3-12 Michael Totschnig, Michael Derntl, Israel Gutiérrez, Jad Najjar, Roland Klemke, Joris Klerkx, Erik Duval and Franz Müller The Ariadne Registry of LORs 13-22 José Luis Santos, Joris Klerkx and Erik Duval Where is the user? Filtering Bots from the Edurep Query Logs 23-29 Wim Muskee Handling Multiple Metadata Streams Regarding Digital Learning 30-34 Material Jasper Roes, Jeroen van Vuuren, Nico Verbeij and Henk Nijstad Managing Broken URLs in Federated Metadata 35-45 Tien-Dung Le and Elena Shulman Preliminary Discussion on a Digital Curation Framework for Learning 46-50 Repositories Nikos Palavitsinis, Nikos Manouselis and Salvador Sanchez-Alonso Automatic Keywords Extraction – a Basis for Content 51-60 Recommendation Ivana Bosnic, Katrien Verbert and Erik Duval Conversion of the YDP Learning Content to Common Cartridge 61-68 Package Jaroslaw Dziedzic Authoring for Re-use in Outcome-oriented Learning Scenarios 69-73 Roland Klemke and Birgit Schmitz Exploring Quality Issues in the Use of LOs: To Tag or Not to Tag? 74-78 Rune Hjelsvold, Jingjing Fan, Yngve Nordkvelle and Kjell Are Refsvik Educational Resources Packaging Standards SCORM and IMS 79-88 Common Cartridge – The Users Point of View Kati Clements, Àgueda Gras-Velázquez and Jan M. Pawlowski Fourth International Workshop on Search and Exchange of e-le@rning Materials (SE@M’10) Elena Shulman1 1 European Schoolnet (EUN), Brussels, Belgium Context and Objectives The SE@M’10 workshop was held on September 27-28, 2010 in conjunction with the Fifth European Conference on Technology Enhanced Learning (EC-TEL’10) in Barcelona, Spain. Over the last fifteen years, considerable effort has been spent on the development of standards and specifications in order to improve the interoperability of e-learning systems, repositories, and content. These efforts have led to significant improvements in the arena of technical interoperability enabling the emergence and expansion of successful federations and alliances of learning object repositories such as the LRE, GLOBE, etc. Building blocks for this success have been the creation, evolution and adoption of standards such as the IEEE Learning Object Metadata (LOM), the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), and the Simple Query Interface (SQI). More recent developments in this field include the definition of a Simple Publishing Interface (SPI) and the ongoing work within the IMS Global Learning Consortium on the Learning Object Discovery & Exchange (LODE) specification to facilitate the discovery and retrieval of learning objects stored across more than one collection. The development of best practices (for example by projects like ASPECT and ICOPER), enabling more efficient use of these standards and specifications, has also proven instrumental in the successes of current learning content retrieval infrastructures. To further developments in this field, the main goal of this international workshop was to offer a forum where researchers and practitioners discussed theoretical aspects, open issues, and innovative approaches and shared the latest advances in the state of the art and practices for exchanging and describing learning content. 1 Program Among the 12 papers submitted to the workshop, 11 (6 full papers and 5 short papers) were accepted by the SE@M’10 program committee. The papers were presented in the course of the two day workshop. Two keynote speakers opened and closed the workshop. Papers explored innovative approaches and technical solutions for the management of learning object repositories, data models for efficiently organizing multiple metadata standards, content packaging, digital curation, keyword extraction, user generated metadata and a series of issues encountered in federated metadata management. Organizers David Massart and Elena Shulman, European Schoolnet (Belgium) Program Committee Vladimir Batagelj, University of Ljubljana, Slovenia Jean-Noel Colin, University of Namur, Belgium Ingo Dahn, University of Koblenz, Germany Erik Duval, Katholieke Universiteit Leuven, Belgium Joris Klerkx, Katholieke Universiteit Leuven, Belgium Manuel Kolp, Universite catholique de Louvain, Belgium Robert Kristoefl, BMUKK, Austria Eugenijus Kurilovas, ITC, Centre of Information Technologies of Education, Lithuania Tien-Dung Le, European Schoolnet (EUN), Belgium Nikos Manouselis, Greek Research & Technology Network, Greece Jon Mason, Queensland University of Technology, Australia Nick Nicholas, Australian National Data Service (ANDS), Australia Tomasz Orzechowski, AGH University of Science and Technology, Poland Fredrik Paulsson, Umea University, Sweden Jan Pawlowski, University of Jyväskylä, Finland Alain Pirotte, Universite catholique de Louvain, Belgium Daniel Rehak, ADL, USA Griff Richards, Athabasca University, Canada Bernd Simon, WU-Wien, Austria Stefaan Ternier, Open Universiteit, The Netherlands Frans Van Assche, Ariadne Foundation, Belgium Riina Vuorikari, European Schoolnet, Belgium Nigel Ward, University of Queensland, Australia Tsuneo Yamada, Open University, Japan 2 Repository Services for Outcome-based Learning Michael Totschnig1, Michael Derntl2, Israel Gutiérrez3, Jad Najjar1, Roland Klemke4, Joris Klerkx5, Erik Duval5, Franz Müller1 1 Vienna University of Economics and Business 2 University of Vienna, Faculty of Computer Science 3 Universidad Carlos III de Madrid 4 Open University of the Netherlands 5 Katholieke Universiteit Leuven michael.totschnig@wu.ac.at Abstract. Despite the existence of numerous standards and specifications in technology-enhanced learning, there is a lack of interoperability of artifacts and services throughout the whole lifecycle of outcome-based education. In this pa- per we present the concept and prototypical implementation of an open archi- tecture that aims to remedy these issues by providing a unified metadata and service layer for making key educational resources sharable, storable, findable, and interoperable. The reference model and its supporting technology architec- ture are tested by a family of prototypes implemented as extensions to or adap- tations of existing mainstream systems like Moodle, .LRN, Elgg and Facebook. Keywords: outcome-oriented education, learning object repository, middle layer 1 Introduction Conceiving services that make learning resources usable for design and delivery of outcome-based learning is a challenge that current learning object repositories are not yet able to meet. On the one hand, learning designs are dissociated from the learning opportunities (course offerings) where they have been or will be put into context, and thus it is difficult for the users (both the learner and the teacher) of those repositories to understand how they can benefit from these resources in the most fruitful way. On the other hand, learning outcomes are not yet defined and linked in a systematic way to learning designs, and even if they were, information about the learning or teaching history of the user would be needed in a standardized format, if the system was supposed to make meaningful suggestions. The ICOPER Reference Model (IRM) aims at providing a framework through which innovative learning processes that exploit rich linkages between teaching me- thods, learning designs, learner assessments, learning resources, learning outcome definitions, user profiles for achieved learning outcomes and for learning needs, and learning opportunities can be stored, shared and delivered through standardized ser- vices and data formats. The Open ICOPER Content Space (OICS) is conceived as the testbed for implementing the IRM. Figure 1 displays the key processes in designing 3 and delivering outcome-based education. From a pedagogical perspective, the align- ment of learning outcomes with assessment methods and teaching methods is one of the core foundations of the Bologna Process [6]. Implementing this foundation, the IRM as a reference model and the OICS as its reference implementation represent a visionary yet realistic approach to fully (i.e. conceptually and technically) supporting key pedagogical processes. These include the definition of intended learning out- comes; the reuse and creation of teaching methods and learning designs; the selection of content; and the offering and delivery of concrete learning opportunities. Going beyond current pedagogical and institutional practice, this approach even enables management of learner assessments, including verification and certification of learners’ achievement of intended learning outcomes; publishing of learning outcome profiles; and institutional exchange of achievement data (between and among universities and companies). Fig. 1. Key processes in outcome-based education. In this paper, we present the architecture of the services that the OICS offers to client applications. These services draw upon existing specifications for content aggregation, federated search and publication that have been validated in numerous projects and organizations such as MACE, MELT, GLOBE, EducaNext and ARIADNE. All of those have been integrated into a middle layer API that provides easy-to-use services for the support of outcome based learning and teaching scenarios. After describing the conceptual and data models defined in the IRM, we explain how these models are made available through the OICS services, and how these ser- vices are consumed by the prototypical tools developed in the ICOPER project. 2 OICS Architecture The OICS infrastructure is built as a composition of services developed and main- tained at Vienna University of Economics and Business (WU) and K.U. Leuven (KUL). At both sides a learning object repository is used for storing metadata har- 4 vested from the ICOPER content providers. Metadata is synchronized between them through the Simple Publishing Interface (SPI) protocol Content is fed into the OICS either through the OAI-PMH protocol or a publication service based on the SPI protocol1. The ARIADNE harvester [2] enables the management of OAI-PMH targets and is integrated with a set of services improving the management of the repositories and the quality of the metadata:  The registry service provides a catalogue of up-to-date information about learning object repositories (LORs) and allows the harvester instance used for the OICS to retrieve information about the OAI-PMH endpoint.  The validation service ensures that only metadata records are stored which comply with the ICOPER LOM Application Profile (see Section 3.1), which is based on IEEE Learning Object Metadata (LOM).  The transformation service allows applying mappings between foreign vocabula- ries to ICOPER specific ones.  The identifier service generates unique and persistent identifiers that are added to metadata records upon ingestion into the OICS. Resolution to multiple views of the resource is provided through a simple web service. These services are explained in more detail in [2]. 2.1 OICS Middle Layer The OICS middle layer tries to bundle the requirements of the IRM together in a co- herent API that is accessible from a variety of systems and tools (see Figure 2). Its key focus is the integration of concepts and data related to the key processes in out- come-based education. The OICS middle layer provides services for search and re- trieval of learning resources, for publication, for the management of users and groups and for the management of learning outcome profiles within these key processes. Search and Retrieval. The search and retrieval service gives access to the OICS resources by providing specific access methods for the different types of objects (learning outcome definitions, teaching methods, learning designs and learner assess- ments). Three example bindings for this service have been implemented:  The ATOM binding exposes all resources as ATOM feeds that can be filtered based on values in the LOM metadata.  The JSON binding uses a REST interface and provides the results to the client tools in the JavaScript Object Notation data format. JSON is a lightweight data format heavily used by web developers due to its simplicity (e.g. native evaluation of results in JavaScript) compared to the traditional XML data format approaches, which often require cumbersome DOM-based processing.  A PHP search script forwards PLQL [3] expressions to the SQI SOAP end point provided by the KUL repository. Publication. The OICS implements the Sword/AtomPub binding of the SPI - specification, learning objects and metadata records can be published to collections. Extending the SPI specification, The OICS implementation allows updating and re- 1 For metadata editing and demonstration purposes the OICS also provides a web UI. 5 trieving of parts of the metadata record thus making it very easy to query and mani- pulate individual metadata fields relevant in specific use cases. Open ICOPER Content Space Learning Outcome search / index Desktop applications Definitions search / index Learning Outcome / export Profiles Learning Management Service Layer Systems search / index Teaching Methods search / index Learning designs/ / recommend Units of Learning Widgets, 3rd party tools, … Assessment search / index Resources … Content provider Fig. 2. OICS middle layer Learning Outcome Profiles. Users of the OICS can manage their learning out- come profile according to the Personal Achieved Learning Outcomes (PALO) data model (see Section 3.3) both through a web UI and the same publication service as used for publishing learning resources. 3 Conceptual and Data Models of the ICOPER Reference Model The OICS as a repository managing shareable educational resources implements the following data models: (1) the ICOPER LOM Application Profile (AP) as a unified metadata layer above learning designs, teaching methods, assessment methods, learner assessments and learning content, (2) an IEEE RCD-based specification for representing learning outcome definitions (LOD) and (3) the Personal Achieved Learning Outcomes (PALO) specification [1] for learning outcome profiles. 6 3.1 ICOPER LOM Application Profile A metadata schema is needed to describe and store resources in a referatory. As one of the most widely used, supported, and implemented standards, IEEE LOM was profiled to enable the description of learning design resources (teaching methods, learning designs, assessment methods, and learner assessments), but care was taken to make the same profile also applicable to other types of learning resources stored in the OICS. The resulting ICOPER LOM AP ensures that resources described using different standards and specifications like IMS Learning Design (LD), IMS Question and Test Interoperability (QTI), and so forth, become semantically interoperable. As one central feature, the AP allows defining intended learning outcomes for all resources in the OICS; to enable this, the Educational category of LOM was extended with an element containing a link to a learning outcome definition. This simple exten- sion enables several added-value use cases and scenarios, as indicated and prototypi- cally demonstrated in Section 6. 3.2 Learning Outcome Definitions (LOD) The Learning Outcome Definitions (LOD) data model defines a conceptual base schema for describing and sharing learning outcome definitions in the context of online and technology enhanced learning. The data model provides a way to capture the key characteristics of a learning outcome, independently of its use in any particu- lar context or target group. This model should enable the storage and retrieval of learning outcomes across learning systems that deal with learning outcomes data. This specification is based on, and is an application profile of, the IEEE Reusable Competency Definitions (RCD) standard. EEE RCD is the only widely accepted standard for describing competencies. It is a continuation (and replacement) of the early efforts on the development of IMS Reusable Definition of Competency or Edu- cational Objective (RDCEO). This specification profiles IEEE RCD with one metadata element defining the type of the learning outcome and its associated value domain to capture whether a learning outcome refers to knowledge, skill or competence following the definitions of the European Qualification Framework [4]. Instances that conform to the LOD specification also conform to the IEEE RCD stan- dard, which ensures interoperability between the OICS and other systems. 3.3 Personal Achieved Learning Outcomes (PALO) The Personal Achieved Learning Outcomes (PALO) data model [1] is a simple schema proposed to capture information on knowledge, skills and competences achieved by a learner and relations between those outcomes. Information on the context where the learning outcomes are obtained or applied, evidence records and levels (e.g. proficiency level) associated to the outcomes are also part of this schema. One of the main challenges of communities and systems that deal with learning outcome information is interoperability. Different communities and systems may use different data models to represent information on skills, knowledge or competence obtained by a person that is required for a job or a task. The PALO specification is a step towards a common model supporting the exchange of such data, to enhance inte- 7 roperability of personal learning outcome information between, for example, learning management systems, e-portfolios, social applications and recruitment systems. This data model enables describing relations between learning outcomes of learners, in addition to contextual and evidence related information. The PALO schema should enable capturing the following:  Relations between achieved learning outcomes, regardless of the taxonomies or ontologies they belong to;  Contextual information on where the achieved learning outcome is obtained or applied;  Information about all types of evidence and assessment that prove the achievement of a learning outcome;  Information about levels and ranking of an achieved learning outcome, like proficiency level. The PALO model has been proposed as EU specification for capturing data of personal achieved learning outcomes at CEN Workshop on Learning Technologies2. 4 Linking Learning Outcomes, Teaching Methods, Learning Opportunities and Learning Designs The following hypothetical scenario illustrates how university administration, faculty and learners can benefit from making the systems that manage teaching and learning processes interoperable through the adoption of the services described in the IRM and provided by the OICS. 1. At University X, a new curriculum for a course program is developed. Each course description is linked to learning outcome definitions (LOD) and suggested teaching methods (TM), both stored in the OICS. 2. Best practices for TMs have been elaborated by a consortium of universities, and for each of them a template has been elaborated that can be used by instructional designers. 3. An instructional designer creates a new learning design (LD) for the program. He retrieves the LODs mentioned in the curriculum from the LOD repository and searches the TM repository for a suitable template including assessment methods. He imports it into his authoring environment, adds resources and learner assess- ments retrieved from the OICS. Since learner assessments are linked to learning outcomes and assessment methods, he is able to retrieve the most relevant re- sources. The LD is made available for feedback in a restricted collection. 4. Additional links to LODs and TMs can be added by program management. 5. Once the LD has been approved by the program management, the LD is published. 6. It is imported into the institutional LMS, and automatically a learning opportunity is pushed to a registry service for learning opportunities. 7. Upon each completion of the learning opportunity, feedback from teachers and learners is collected and the metadata is enriched. 2 See http://www.cen.eu/cen/Sectors/Sectors/ISSS/Activity/Pages/WSLT.aspx 8 8. Learners completing the learning opportunity, including the successful finishing of learner assessment (LA), have their learning outcome profile augmented with entries for each achieved learning outcome. The achieved learning outcomes are evidenced by assessment records (AR), results of the assessment process. 9. Prospective learners that use the OICS LOD repository for identifying learning outcomes that correspond to their learning needs, will be able to retrieve other learners that already have achieved these outcomes, and since their profile also links to units of learning and learning opportunities, relevant recommendations can be presented to the learner. The different parts of this high level scenario have been implemented by partners of the ICOPER consortium in various prototypes that make use of the OICS reposi- tory services.. In the following, we describe for each of the processes in Figure 1 how they are realized in some of the prototypes. 4.1 Learning Outcomes - Definitions Learning outcomes can be defined for a single course, taught by several teachers, or be standardized across universities or whole domains. Instead of describing learning outcomes from scratch every time a new course is created, instructional designers should be provided with a list of relevant learning outcome definitions that they can link to their courses. For example, IEEE and ACM Computer Science task force has defined the list of learning outcomes for all computer science courses. These learning outcomes have been imported into the OICS and can be reused across different course and universities. In this way, different universities use common learning outcome terms for describing what students would achieve by finishing a specific course. The OICS also provides a simple-to-use web interface for creating and editing them. A browsing interface is currently implemented for the ICOPER public website. New definitions can also be created from within several authoring environments and LMS and are submitted to the OICS through the publication service. 4.2 Teaching Methods and Learning Design: OpenGLM OpenGLM (short for Open Graphical Learning Modeler) is an open-source learning design editor, developed at Universität Wien. It is able to manipulate learning designs compliant with the IMS Learning Design specification. It was built to visually support the creation and reuse of teaching methods and learning designs. OpenGLM uses the search and retrieval services and the publishing service of the OICS Middle Layer; it uses those parts of the services that are related to retrieving, searching, editing, enriching and publishing learning designs (packaged as IMS LD units of learning), teaching methods and learning outcomes. OpenGLM is the key prototype supporting use cases in the “Teaching Methods and Learning Design” process in Figure 1. 4.3 Learning Content: OICS Roundtrip Authoring Re-use Open University of the Netherlands and Humance AG have implemented an OICS roundtrip authoring re-use prototype, that brings together an asset management soft- 9 ware (MediaLibrary), an authoring environment (author42), and an LMS (OLAT). This prototype addresses the authoring and creation of units of learning with strong support for re-use of existing materials. It focuses on three main processes: 1. The collaborative collection and organization of media assets. 2. The collaborative creation of learning units based on these media assets. 3. The preparation for re-use of media assets and learning units. The prototype has several connections to the OICS: through OAI-PMH targets in- dividual media assets (MediaLibrary) as well as units of learning (author42, OLAT) are made available to the OICS infrastructure together with their LOM-based meta- data. The OICS search services are integrated in author42 to enable seamless integra- tion of existing media in the authoring process. More details on this prototype can be found in an accompanying paper [5]. 4.4 Learning Opportunities Knowledge Markets Consulting Ges.m.b.H. is implementing a prototype on top of its 2know2 platform, that allows to announce learning opportunities for courses that are linked to learning outcomes and teaching methods. The learning outcomes and teaching methods are directly stored in and retrieved from the OICS via the search, retrieval and publishing services of the OICS Middle Layer. New learning opportuni- ties can be announced at the 2know2 platform with a news article and an RSS feed and can also be published in the repository for learning opportunities at the OICS. 4.5 Learner Assessment University Carlos III of Madrid (UC3M) has developed a prototype integrated into the .LRN platform that demonstrates various use cases related to the learner assessment process: (1) Outcome-based search of assessment resources. Once defined assessment method and learning outcomes of a course, the instructional designer should find appropriate assessment resources aligned to them, and this process is enabled by the .LRN module. The designer is able to search for assessment resources in the OICS by keyword, but he is also able to filter the results by the intended learning outcomes of the course (available in the learning outcome definitions repository) and by the assessment method he decided to use to orchestrate the resources. The described use case makes use of the search service of the middle layer API of the OICS in order to search assessment resources from several repositories. This ser- vice also provides means for the filtering process by learning outcomes and/or as- sessment method. (2) Sharing annotations about assessment resources. Once an activity involving a published assessment resource has finished, teachers can make use of the annotation system in order to provide information about the students’ performance. Teachers could also collect students’ feedback to annotate the assessment resource with. This information enriches the published resource for potential instructional designers will- ing to re-use it. 10 This sharing annotation system uses some services of the OICS via the middle layer API. Firstly, the publication service is used to publish the resources and their LOM metadata in the OICS. Secondly, the service that allows updating the metadata record of a resource is used to update it with annotation information. 4.6 Learning Outcomes – Achievements UC3M’s prototype also takes care of the publication of the achieved learning out- comes into a learner learning outcome profile on the OICS. After the completion of the course, and therefore the assessment activities, it is time for teachers to officially close the course. The implemented application provides the teacher with an interface to facilitate this task. It shows a list of the students of the course and the assessment result (grade) of each assessment activity carried out in the course. There is also a final grade automatically calculated as the arithmetic mean of all the results, but the teacher can modify it taking into account other factors. The teacher can also provide some textual feedback for any student’s final grade. Once the teacher has finished this task, he can officially close the course. The re- sults of this action is that the profiles of all the students passing the course (in this case it means a final grade of 5 or more because the prototype uses a grading scale 0 to 10, proper to the Spanish system) will be updated with the achievements of the course, that is, the learning outcomes achieved. These achievements are also evi- denced by an assessment record, which is an official record corresponding to the final grade of the course and has the University as the assessing body. This process uses the service of the OICS middle layer that allows updating the learner’s profile with assessment records and achievements. Other tools and environments have been integrated with OICS services, and are able to publish achievements into a learner’s profile or display them:  Umeå University, Sweden, has developed a Moodle block that imports learning designs into a course and exports learning outcomes for students enrolled in it.  IMC, Germany, has extended its LMS Clix with functionality for the execution of outcome-based learning designs as well as the management of learners' PALO.  eXact learning solutions (formerly Giunti Labs), Italy, has implemented access to OICS’ learning outcome related features into its learn eXact enterprise LCMS.  AGH – University of Science and Technology, Poland, has integrated the display of PALO profiles into Facebook.  Tallinn University, Estonia, has integrated the e-portfolio environment, Elgg, with OICS services for searching learning resources and for publication of PALO data. 5 Conclusions and Future Directions With the services we have designed and implemented for the OICS, we want to make learning resources available in the contexts where outcome related education takes place: LMS, personal learning environments, social networks. By providing richer linkages between learning needs, learning designs, teaching methods, learning outcomes and learning opportunities, teachers and learners will be able to make more 11 innovative use of available learning objects. The OICS is also designed to store information about user’s experiences as annotations, and thus to create still more opportunities to discover relevant resources. In order to provide a good practice of enabling the design and delivery of outcome based learning, prototypes implemented in ICOPER project extended the functionality of existing systems that are already used by teachers and learners. Currently, the implemented prototypes are being evaluated by target end users like learners, teachers and instructional designers. The goal of the evaluation is to deter- mine:  Ease of use, usefulness and completeness of implemented outcomes based functionalities;  Added value and innovation of introduced functionalities to target users;  The extent that such applications are being adopted by universities  Interoperability level of data exchanged using the specifications adopted and implemented in this project. Acknowledgements. The work presented in this paper was supported by the European Commission in the eContentplus project ICOPER (ECP-2007-EDU- 417007). References 1. Najjar, J., Derntl, M., Klobučar, T., Simon, B., Totschnig, M., Grant, S., Pawlowski, J.: A Data Model for Describing and Exchanging Personal Achieved Learning Outcomes (PALO), International Journal of IT Standards and Standardization Research (IJITSR), vol. 8, issue 2 (2010) 2. Klerkx, J., Vandeputte, B., Parra, G., Santos, J.L., Van Assche, F., Duval, E.: How to Share and Reuse Learning Resources: the ARIADNE Experience, European Conference on Technology Enhanced Learning (ECTEL) (2010) 3. Ternier, S., Massart, D., Campi, A., Guinea, S., Ceri, S., Duval, E.: Interoperability for Searching Learning Object Repositories. The ProLearn Query Language. In: D-Lib Magazine. vol. 14, nr. 1/2 (2008). http://www.dlib.org/dlib/january08/ceri/01ceri.html 4. EQF. The EQF for lifelong learning, Office for the publication of the EC, ISBN 978-92- 79-0847-4 (2000) 5. Klemke, R., Schmitz, B.: Authoring for re-use in outcome-oriented learning scenarios. Se@m 2010, the 4th International Workshop on Search and Exchange of e-le@rning Materials, Barcelona, Spain (2010) 6. European Commission: The Bologna Declaration (2000). http://ec.europa.eu/education/policies/educ/bologna/bologna.pdf 12 The Ariadne Registry of LORs Jose Luis Santos, Joris Klerkx, and Erik Duval joseluis.santos, joris.klerkx, erik.duval @cs.kuleuven.be Dept. Computerwetenschappen K.U.Leuven Celestijnenlaan 200A B-3000 Leuven Belgium Abstract. The ARIADNE registry is one of the core components in an architecture that promotes interoperability of networks of repositories that facilitates the access to the learning content and encouraging the share and reuse of digital content. This paper explains the development of the ARIADNE registry of learning object repositories (LORs) and the role played by standards and specifications. 1 Introduction One of the problems of Technology Enhanced Learning (TEL) is that creating Learning Objects (LOs) is expensive and time-consuming [1]. Sharing is one of the possibilities to address this problem. When content creators search for materials to reuse, they typically do not care about where the resource is located, but want to find the best quality materials that satisfy their needs [2]. LOs are typically stored in Learning Objects Repositories (LORs). In ARI- ADNE, considerable effort has been spent on the development of standards and specifications for LORs [3], including IEEE LOM [4], SQI [5], SPI [6], OAI-PMH [7] and PLQL [8]. These allow effective share and reuse of LOs between different LORs and networks of LORs. One of the problems for managing all these repositories within a network is the scalability. Currently, every repository is added manuallyin the harvester or in the federated search layer service. However, this time consuming process requires that one person configures the targets and its parameters such as the requirements (e.g. query languages for querying or metadata formats for har- vesting). In addition, some extra information about the content inside of the repos- itories can be useful for managing this architecture. For instance, if we know that one repository contains LOs focused on mathematics, and we are interested in them, we can select this target to be harvested. To enable the sharing and exchange of this information with other networks of repositories, the information needs to be structured and managed. 13 This paper focuses on how the ARIADNE implements the ARIADNE Reg- istry that has been integrated in the GLOBE architecture [9] and how ARIADNE addresses the problems described above by: 1. using a reference implementation developed for the ASPECT project [10] based on the concept of content collection. This content collection contains relevant information about the content of the collections and technical in- formation for accessing them; 2. creating a registry for managing this information; 3. using standards and specifications for increasing the interoperability within networks of the repositories; 4. creating a network of registries in order to exchange the information between them and to be able to access the LOs of other networks. In this way, all the modifications done in a network can be widespread through all the networks of the registries. The paper is organized as follows: section 2 which shows a possible use case of the approach. Section 3 introduces an explanation of the Registry Data Model co-developed in ASPECT project. Section 4 explains the Registry Architecture. Some statistics and data are presented in section 5. 2 Use Case. ARIADNE Foundation integrates the ARIADNE registry in its infrastructure This section focuses on a use case for the ARIADNE registry and discusses: (i) the integration of the ARIADNE Registry in existing architectures and (ii) increasing the collaboration between different institutions for exposing their LOs. Also, we introduce some technical details about the implementation. The final goal is to explain the use of the registry in a non formal way. ARIADNE Foundation has several repositories where the LOs are described by Learning Object Metadata. They expose the metadata using SQI and OAI- PMH. However, they have noticed that the number of repositories is increasing and it’s difficult to manage all the information from other content providers. These content providers describe their LOs using different specifications like LOM and Dublin Core(DC), on the other hand some content providers expose the LOs using OAI-PMH and/or SQI. Looking for a good solution, they decide to integrate the ARIADNE Registry to manage this information because: 1. The ARIADNE Registry allows the definition of LORs using IMS LODE. That is open in terms of using specifications. 2. The ARIADNE Registry exposes the information using SQI, RSS and OAI- PMH. SQI allows to query the registry. RSS alerts subscribers when a new target is added. OAI-PMH allows to harvest all information from the ARI- ADNE Registry. These three specifications allow to integrate their ARI- ADNE Federated Search Layer which queries different repositories to obtain different LOs from different repositories, the ARIADNE harvester which 14 harvests metadata from different repositories and to build a federation of registries which allows to collaborate with different institutions. The benefits that they obtain from the integration of the ARIADNE Registry are: 1. They have centralized all the information from all the repositories where they harvest from. It saves time in terms of administration tasks. 2. They can implement other services on top of the registry to check the avail- ability of the different services. This service allows that other services can check this information before trying to access the targets. This information optimizes the performance behavior of these services. 3. The integration of the registry allows that new targets added are widespread by different network of LORs which are federated with ARIADNE network. 3 ARIADNE Registry Data model The ASPECT project [10] has co-developed an application profile of IMS LODE[11]. It is based on IMS Dublin Core Collections Application Profile specification and complemented with ISO 2146 and IEEE LOM [12]. This specification uses the concept of content collections. A content collection is defined as a group of dig- ital content which is exposed to the world through some protocols based on standards or specifications. ARIADNE has chosen this specification, because it is not restrictive in terms of use of specifications, and it increases the possibilities of interoperatibility between architectures. In addition, the model does not restrict how the content collections are created. This is an important issue, because the content providers can choose how they create them and can offer metadata information about the collections that they are interested in. This model is represented by a schema that contains three main elements Content Collection, Metadata Collection and Protocol. 1. Content Collection contains information about access rights, authoring, title description, keywords, etc. This is information about the content itself. 2. Metadata Collection contains information about how the metadata is ex- posed. Here, the content provider has to define which specification are used to expose metadata. This part of the specification has an element for defining the specification called Protocol Implementation Description which is used for defining extra information like the query language supported by an SQI interface or sets supported by an OAI-PMH interface. 3. Protocol contains specific information about the specification used like the URL of the schema, namespace or the binding location. Several examples can be found at the ARIADNE Registry site [13]. 15 4 ARIADNE Registry Architecture The implementation of the registry enables ARIADNE to build a federation of registries, to provide access to collection information using SQI and to publish new content collections using SPI. Connections to all the networks querying one registry Federated Search Harvester Registry 1 SQI OAI-PMH CC RSS MC SQI SPI MC OAI-PMH CC SPI MC SQI Registry 2 SQI CC OAI-PMH MC SRU/W CC Synchronization MC SQI RSS SPI MC OAI-PMH Registry 3 CC SPI SQI MC SQI OAI-PMH CC MC SQI RSS CC MC OAI-PMH MC SRU/W SPI CC SPI MC SQI CC MCSRU/W Fig. 1. ARIADNE architecture approach A registry can contain metadata collections (MC in the figure 1) or content collections (CC). The latter contain metadata collections (MC). The final goal of the registry is to create a network of networks of reposi- tories, similar to current DNS functionality on the internet. All the registries 16 are synchronized so if a harvester or a federated search service queries a registry, they can access all the content collections or metadata collections in the different networks. This approach is important: it allows automated discovery, decreases time spent managing repository information, and allows automated widespread updates. This architecture shows how the registry can be synchronized with other reg- istries. The registry exposes its contents using OAI-PMH so that other registries can harvest its content. In addition, the registry exposes its updates also using the RSS 2.0 specification, so that registries can be synchronized also with RSS feed readers. This RSS system can be used as a notification system, as it contains all the targets published. The registry uses SQI for querying because it is neutral in terms of query language or results formats. Consequently, all SQI clients can query the registry. The registry supports different query languages like PLQL, VSQL and Lucene Query Language. Finally, the registry implements an SPI interface for publishing content col- lections. For instance, when other content collections are harvested, they are inserted using SPI to publish content by reference, because the metadata in this case defines a content collection already published elsewhere. One of the main advantages of this implementation is that ARIADNE Reg- istry is based on the ARIADNE Repository. This software is a flexible imple- mentation that allows different models of metadata. For instance, it supports LOM, ILOX, Dublin Core and this paper explains how it supports IMS LODE Registry specification. The idea behind this software is that we can index all kind of xml document. However, the final goal is that we can built combined queries, for instance, using PLQL, based on the content of different tags. Trying to validate this architecture, we have integrated the ARIADNE har- vester with the registry. The following sequencing diagram 2 shows how a sys- tem administrator can configure the harvester using the current implementation where the system administration can query the registry and add the targets that he/she is interested in. 5 Related Work One of the main goals of ARIADNE is to encourage sharing and reusing LOs. For this purpose, we have implemented a registry using the concept of content collection, integrating some technical information in the data model and allow- ing the federation of registries. Table 1 shows the similarities between existing approaches and the ARIADNE approach. Domain Name System (DNS)[14] has conceptual similarities with the ap- proach explained in this paper. While DNS translates a name identifier to a number identifier, the registry translate abstract information contained by the content collection to a technical information contained in the metadata collec- tion. In addition, the federation of DNS allows the propagation between them of the new sites. It is a similar approach that ARIADNE Registry implements with 17 ARIADNE User interface ARIADNE Registry Harvester To Send keywords To complete the query with (1) Results Forms for adding/update/delete new targets Request Forms updated (1) metadataCollection.target.targetdescription.protocolIdentifier.entry = "oai-pmh-v2" Fig. 2. Interaction diagram - Configuring the harvester Table 1. Similarites with an existing approaches (Non-exisiting, Weak and Strong) Dif f erent approaches Content collection Extra technical inf ormation F ederation DN S Non-exisiting Non-exisiting Strong M W SDI Weak Strong Weak IESR/OCKHAM Strong Weak Weak CORDRA Strong Weak Weak 18 the federation of the registries (the new content collections should be propagated among them) METEOR’s Web Service Discovery Infrastructure (MWSDI) uses UDDI spec- ification for defining the interfaces (web services) and it defines an ontology for providing support to the infrastructure using OWL. MWSDI is based on the need to decentralize an architecture for several reasons such as geographical location, nature of registered services, business functionality, technical specifi- cations and so on [15][16]. For this reason, it contains an ontology with details of the domain, Registries, Ontologies and Registries federation and network of relationship among them. The main difference is that MWSDI doesn’t contain a content collection approach and the ARIADNE registry doesn’t implement an hierarchical architecture of registries. JISC Information Environment Service Registry (IESR)[17] and OCKHAM NSF / NSDL Registry [18] are developed under Global initiatives Registries [18]. They introduce the concept of content collection approach, while IESR uses Re- search Support Libraries Programme Collection Description schema (RLSPCD) and Dublin Core, which allows the definition of some technical details of the services which provide access to the collections[17], OCKHAM NSF/NSDL uses Dublin Core. They consider the definition of how the repositories expose the metadata considering protocols like Z39.50, Web Service SOAP, SRW and OAI- PMH. However, they don’t provide extra information in their implementations such as sets supported by OAI-PMH or query languages. The registries expose their metadata through a SRU client [19, 16]. CORDRA / ADL registry (ADL-R) merges the concepts or MWSDI and IESR/OCKHAM, introduces a hierarchical structure of registries and the con- cept of content collection based on repositories which accept LOM as a metadata of LOs [20] . At the end, we can conclude that Ariadne Registry incorporate the best characteristics from other approaches: 1. Conceptually, we implement similar approach to DNS widespread function- ality. 2. IMS LODE Registry allows the definition of the services like UDDI specifi- cation concept do. 3. IMS LODE Registry allows the content collection approach like the above approaches mentioned do. 4. ARIADNE Registry allows the federation of the registries. 6 Statistics In this section, we include some information about the metadata collections in- serted and how they expose the metadata. It includes the number of records harvested using the OAI-PMH protocol. However, SQI or SRU/W don’t provide a way to get all the possible results, thus the SQI and SRU/W targets do not contain information in date and records harvested field (table 2). 19 Table 2. Records harvested and protocols id date records SQI SRU OAI SP I dum rvp cz 09 Apr 2010 2323 No No Yes No atlas f ri uni − lj s 12 Apr 2010 4400 No No Yes No openlearn open ac uk 12 Apr 2010 535 No No Yes No sodis de 12 Apr 2010 2277 No No Yes No sdt sulinet hu 12 Apr 2010 18864 No No Yes No ait opetaja ee 12 Apr 2010 4050 No No Yes No 193 43 17 27 12 Apr 2010 19309 No No Yes No 193 93 132 187 12 Apr 2010 3541 No No Yes No www yteach com 12 Apr 2010 10 No No Yes No aspect 12 Apr 2010 124933 Yes No Yes Yes aspect cup cam ac uk 12 Apr 2010 8839 No No Yes No melt contento se 12 Apr 2010 3014 No No Yes No spindeln iml umu se 12 Apr 2010 6699 No No Yes No lom emokykla lt u 1 12 Apr 2010 1101 No No Yes No 195 82 131 106 12 Apr 2010 196 No No Yes No aplitic xtec cat 12 Apr 2010 9876 No No Yes No www klascement net 12 Apr 2010 17739 No No Yes No www portaldasescolas pt 12 Apr 2010 1554 No No Yes No f enix isf tic mepsyd es 12 Apr 2010 26666 No No Yes No M ace 0 13 Apr 2010 180461 No No Yes No Globe 13 Apr 2010 385739 Yes No Yes No caad asro kuleuven be 3 May 2010 450 No No Yes No caad asro kuleuven be 0 3 May 2010 65 No No Yes No caad asro kuleuven be 1 3 May 2010 171 No No Yes No caad asro kuleuven be 2 3 May 2010 890 No No Yes No caad asro kuleuven be 3 3 May 2010 2773 No No Yes No caad asro kuleuven be 4 3 May 2010 50 No No Yes No caad asro kuleuven be 5 3 May 2010 2106 No No Yes No cumincad scix net 3 May 2010 9636 No No Yes No ariadne members 3 May 2010 3290 Yes No Yes No ariadne partners 3 May 2010 71600 Yes No Yes No icoper 3 May 2010 16454 Yes No Yes No eun 3 May 2010 187722 Yes No Yes No ourj code 3 May 2010 297 Yes No Yes No iskme 3 May 2010 30907 No No Yes No oai rails it com 3 May 2010 211 No No Yes No oai rails it com 0 3 May 2010 2223 No No Yes No oai rails it com 1 3 May 2010 29 No No Yes No kocw Not applicable Not applicable Yes No No No merlot Not applicable Not applicable Yes No No No educanext Not applicable Not applicable Yes No No No lornet Not applicable Not applicable Yes No No No espol Not applicable Not applicable Yes No No No agrega Not applicable Not applicable Yes No No No nime Not applicable Not applicable Yes No No No f edora Not applicable Not applicable No Yes No No T otal 1151000 14 1 38 1 20 Table 2 shows that the registry contains fourteen repositories which sup- port SQI, one repository which supports SRU/W, twenty-five which support OAI-PMH and one which supports SPI. As the ARIADNE harvester relies on the registry, we know how many objects are exposed through OAI-PMH. The information is shown in the table. The result is that we expose 1151000 metadata instances through the ARI- ADNE registry so far, although, we have to remark that some of them are du- plicated, because some targets harvest other targets. But it means that we can expose a large amount of metadata instances specifying the protocol to access to them. All the harvesters which implement OAI-PMH can connect easily to the registry, using an SQI client for finding the targets and harvesting all the metadata from all the repositories described in the registry. 7 Conclusions and future work We have developed a LOR registry, which enables us to set up a federation of registries. Different federations of repositories can easily obtain information from other federations through the registries. In this context, we will deploy the ARIADNE Registry in the ASPECT project, and we will evaluate the pros and cons of the federation. In the scope of this work, we have considered some of the main specifications about querying, harvesting and publishing like SQI, OAI-PMH and SPI. We have spent some efforts to define extra information for the SRU/W. However, we have to validate this specification. Also, we would like to consider more specifications such as OKI OSID. Now, we have integrated the ARIADNE harvester with the registry, in addi- tion, we have to work on the Federate Search Layer to allow querying based on the content of the registries. Acknowledgment The ARIADNE Registry has been funded partially by the ASPECT project: Adopting Standards and Specifications for Educational Con- tent (Grant agreement number ECP-2007- EDU-417008) and by the CEN/ISSS workshop on Learning Technologies (WS-LT). Thanks also to our colleagues Ivana Bosnic and Katrien Verbert for helping us in this work. References 1. Ochoa, X., Duval, E.: Measuring learning object reuse. In: EC-TEL ’08: Proceed- ings of the 3rd European conference on Technology Enhanced Learning, Berlin, Heidelberg, Springer-Verlag (2008) 322–325 2. Ternier, S., Bosman, B., Duval, E., Metzger, L., Halm, M., Thorne, S., Kahn, J.: Connecting oki and sqi: One small piece of code, a giant leap for reusing learning objects. Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications (2006) 825–831 21 3. Duval, E., Forte, E., Cardinaels, K., Verhoeven, B., Van Durm, R., Hendrikx, K., Forte, M.W., Ebel, N., Macowicz, M., Warkentyne, K., Haenni, F.: The ariadne knowledge pool system. Commun. ACM 44(5) (2001) 72–78 4. Neven, F., Duval, E.: Reusable learning objects: a survey of lom-based repositories. In: MULTIMEDIA ’02: Proceedings of the tenth ACM international conference on Multimedia, New York, NY, USA, ACM (2002) 291–294 5. Bernd Simon, Daniel Olmedilla, N.S.e., ed.: A Simple Query Interface for Interop- erable Learning Repositories. Volume 11-18., Chiba, Japan (2005) 6. Ternier, S., Massart, D., Van Assche, F., Smith, N., Simon, B., Duval, E.: A simple publishing interface for learning object repositories. Proc. World Conf. Educational Multimedia, Hypermedia, and Telecommunications, Assoc. for the Advancement of Computing in Education (2008) 1840–1845 7. Lagoze, C., Lagoze, C., Van de Sompel, H.: The open archives initiative: Building a low-barrier interoperability framework. IN PROCEEDINGS OF THE ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (2001) 54–62 8. Ternier, S., Massart, D., Campi, A., Guinea, S., Ceri, S., Duval, E.: Interoperability for searching learning object repositories: The prolearn query language. D-Lib Magazine 14(1-2) (2008) 9. Klerkx, J., Vandeputte, B., Parra, G., Van Assche, F., Duval, E.: How to share and reuse learning resources: the ariadne experience. In: Proceedings ECTEL Conference 2010. Accepted for publication. (2010) 10. project, A.: http://www.aspect-project.org/ (2010) 11. LODE, I.: http://www.imsglobal.org/lode/spec/imslodev1p0bd.html (2010) 12. Massart, D., Smith, N., Tice, R.: Design of data model and architecture for a registry of learning object repositories and application profiles. deliverable 2.2 from aspect project. ASPECT PROJECT (2009) 13. Example Registry Instance, I.: http://ariadne.cs.kuleuven.be/ariadne- registry/search/showmetadata.jsp?query=metadatacollection.identifier.entry 14. Mockapetris, P., Dunlap, K.J.: Development of the domain name system. SIG- COMM Comput. Commun. Rev. 18(4) (1988) 123–133 15. Sivashanmugam, K., Verma, K., Sheth, A.: Discovery of web services in a federated registry environment. In: ICWS ’04: Proceedings of the IEEE International Con- ference on Web Services, Washington, DC, USA, IEEE Computer Society (2004) 270 16. Xiang, X., Morgan, E.: Exploiting ’light-weight’ protocols and open source tools to implement digital library collections and services. D-Lib Magazine 12(10) (2005) 17. Apps, A.: A registry of collections and their services: from metadata to implemen- tation. In: DCMI ’04: Proceedings of the 2004 international conference on Dublin Core and metadata applications, Dublin Core Metadata Initiative (2004) 1–10 18. Registries, G.I.: http://www.globalregistries.org/ (2010) 19. Morgan, E., Frumkin, J., Fox, E.: The ockham initiative - building component- based digital library services and collections. D-Lib Magazine 10(11) (2004) 51–59 20. Jerez, H., Manepalli, M., Blanchi, C., Lannom, L.W.: Adl-r: The first instance of a cordra registry. D-Lib Magazine 12(2) (2006) 22 Where is the user? Filtering Bots from the Edurep Query Logs Wim Muskee Kennisnet Foundation Paletsingel 32 2718 NT Zoetermeer, NL w.muskee@kennisnet.nl Abstract. Edurep indexes learning object metadata from several repos- itories, offering a webservice interface on which portals can build their own search implementation. At Edurep query log level, no obvious distinction can be made between human users and webcrawlers visiting these portal sites. This makes it impossible to gather any meaningful data on user search behaviour. Four query types, distinguished from the six largest portals’ websites were related to one month of query logs. For two query types a distinc- tion between human and automatic generated traffic could be found. However, these results can only be used to advise connected portals on their interface implementations. More research is needed to actually per- form any reliable filtering. Keywords: webservice, crawler detection, log analysis 23 1 Introduction Edurep is a Dutch learning object search engine, indexing harvested learning object metadata from more than 50 different repositories. Search portal devel- opers can interface with the search engine using the Edurep webservice, available through the SRU/SRW protocol (Figure 1). Although operational for some years [11], the operators gained access to the search query logs only recently (december 2009). Through analysis of these logs and webserver logs of one portal, the operators discovered that a significant amount of queries came from various search engine bots1 . Among several harm- Fig. 1. A simplified diagram of Edurep in its context. ful aspects, Edurep is affected by two in particular. First, and obviously, we- bcrawlers generate extra traffic, possibly limiting performance for human users. Secondly, webcrawlers generate automated traffic, making it harder for the op- erators to infer meaningful human interaction results from the Edurep query logs. Most of these search engine bots can be identified at search portal level based on their HTTP request User-agent string or IP adress [12,9]. However, this in- formation is no longer available when the request reaches Edurep. This problem is not typical for Edurep, but applies to any webservice which allows connections from a third-party search interface. Examples of these in the learning object context include the LRE [3], MACE [14] and the Spider project [4], all of them available through the SQI protocol [7]. With Edurep as context, this paper aims to explore methods to make a dis- tinction between automated and human queries in webservice query logs. To this end, four query types were distinguished from several search portal web interfaces. The SRU representations for each query were used to filter the logs for a specific query type and analyze it more closely. The paper ends with a discussion of the results. 1 A type of webcrawler; a program which gathers information from the internet by recursively following found hyperlinks. 24 2 Modeling Automated Queries Because webcrawlers only follow hyperlinks, automated searches are caused by the presence of hyperlinks which cause an Edurep search query. An analysis of the portals’ search interfaces is necessary to combine hyperlinks with logged SRU queries. 2.1 Portal Search Interfaces Looking at the search interfaces of the six largest portals (consisting of 97% of query total), four types of hyperlinks were distinguished. – search links: Issuing a search to retrieve a first page resultset. – pagination links: Issuing a search to retrieve another resultset page. – result links: Issuing a search to retrieve a specific record. – facet links: Issuing a search to retrieve the amount of records for that facet. Typically, the portals retrieved either 5 or 10 results after a search query. The number of navigation links ranged from 5 to 20, always including a next and/or previous link and sometimes including links to the first and/or last page. A few included result and facet links. Only one portal (C) performed a search on page arrival. The resulting page included all link types. All the portals’ queries were represented as a url in the browser navigation bar, meaning they can be pasted easily on other webpages for others to click on, including bots. When searching for the portals’ url query prefixes on Google, indeed some results were found. Also corresponding queries were discovered in the query logs. 2.2 SRU/SRW Edurep can be queried using the searchRetrieve operation of the SRU/SRW protocol [13]. Among several supported request parameters [10], the startRecord parameter determines which record of the resultset is displayed first. When omit- ted, it defaults to 1. The maximumRecords parameter sets the number of records each resultset contains. Edurep’s default is 10. A search query typically has no startRecord value at all or a value of 1. Also, to present a reasonable amount of results, the maximumRecords value is set to 5 or higher, or left out to return 10. Pagination queries have a startRecord value higher than 1. In a result query, the startRecord value is omitted or 1. Since a result of 1 is expected, the value for maximumRecords does not need to be 1. However, be- cause a specific record is requested, part of the query value is characteristic. In Edurep, a specific record can be requested by filtering on lom.general.identifier or lom.general.catalogentry, the LOM identifier, or meta.upload.id, Edurep’s in- ternal unique identifier. Facet queries can be performed inside a search query by adding Edurep’s x- term-drilldown parameter to the SRU query. In addition to the search results, 25 a count drilldown for each facet of the requested field is retrieved. Because this function is not supported for all LOM fields, separate facet queries can also be executed. These have a startRecord value of 1 or none at all. Also, the value for maximumRecords is 0 or 1 2 . 3 Dataset The logs of January 2010 were used as dataset and the analysis is done in R [5]. Each log entry consisted of the portal’s ip adress, the timestamp when a search query entered the system (UTC), the size of the response data in kilobytes, the processing time in seconds, the entrypoint of a query on the server indicating the used protocol (SRU or SRW), and the SRU search query. Five variables from each query were used. The IP adress, startRecord and max- imumRecords values were used unprocessed. The query argument was used as a whole, assuming each portal constructed their queries in the same way and query uniqueness was not compared across portals. An identifier boolean was set to 1 if a result link was detected. 4 Results Concerning search queries, the distinction between human and automatic in- duced queries can be made based on the occurence of the queries. Automatic induced queries will appear more often in relation to human generated ones. While Portal C’s startup page query appeared more than 6 times than any of its other queries, a good threshold could not be determined. Assuming most users will never click past the second page of search results [1], facet queries with a startRecord value over 200 will probably be auto-generated (PAG1). A more elegant method for determing automatic facet queries is to scan the logs for pagination ranges. A range was crudely defined as a set of SRU queries (min. 10) with equal query values, a startRecord difference of maximum- Records and a maximum startRecord value higher than 200 (PAG2). Based on occurence of result queries, no clear evidence for automatic querying was found in the logs. This was attributed to the dynamic nature of Edurep’s content, with changing resultsets, different results will be queried. After plotting the unique facet queries of Portal C (Figure 2), the small layer of queries below the top coincided with the facet queries executed on entering the search page. Observing that 10 of the 12 sub-top queries were executed about 2330 times, it was assumed they were caused by automatic querying. From the queries of these types, that amount could be subtracted, leaving their human induced occurences (FACET). Following from this assumption, at least the same amount of automatic hits were generated by Portal C’s startup search query, and could thus also be subtracted. 2 Technically, by setting this value to 0, the same total can be retrieved, but since the usage of 1 had been observed, it was included 26 3000 2500 2000 count 1500 1000 500 0 0 20000 40000 60000 80000 100000 120000 140000 unique facet links Fig. 2. Unique facet link queries plotted against occurences. total PAG1 PAG2 FACET Portal A 41690 -15237 -13355 Portal B 126340 -105026 -89710 Portal C 1293902 -15255 -15654 -30290 Portal D 48841 -47 -62 Portal E 232341 -1778 -1815 Portal F 82527 -406 -205 total 1825641 -137749 -120801 -30290 Table 1. Subtractions of filter method implementations.. The subtractions from each filtering method are displayed next to to each por- tal’s total amount of queries in table 1. 5 Discussion Considerable automatic induced querying was observed. In terms of bandwidth the found ranges from PAG2 alone caused 13,3 Gb of traffic, 26, 5% of the total A-F amount. Concerning the amount of queries, PAG2 and FACET accounted for 8, 4% of the total A-F amount of queries. However, assumptions were made and the used filter methods are still rudimen- tary and incomplete. In using PAG2 for instance, tails or heads of the ranges may lie outside the used dataset. Also, the dataset probably contains heads or tails of ranges from other months. This is even more true when considering the pagination queries don’t need to appear on the timeline in the same order as they appear on the page [2]. Secondly, first- and lastpage pagination queries were not considered in PAG2. 27 The immediate findings of this study make it possible to tailor our advise for portals. One aspect of this is related to blocking crawlers at the portal by imple- menting the Robots Exclusion Standard [6]. Use of this standard could also be enforced through Edurep’s user level agreement. As an unintended side effect, automated usage amplified some examples of inefficient quering on Edurep. An- other aspect of the advise should include information on how to interface with Edurep better. Use of various scripts to parse and filter the log files proved very useful during the conduct of this study. Automating the used scripts will allow the adminis- trators to detect undesirable behaviour in an earlier stage and act on it sooner, leaving Edurep free to be used by actual users. Future research should improve on several aspects. First of all, more months of logging need to be used to combine and compare with current results. Sec- ondly, the SRU query values need to be parsed fully to allow more accurate filtering options and to compare queries across portals. Last is the usage of the portal website. Parameters like the size and format of the pagination links, and the types of search, result or facet links on the page could prove useful in imple- menting better automatic detection methods. A more long term product change would be to also request the end user’s orginal User agent string in the query to Edurep. Also requesting the original IP adress could lead to privacy concerns. Since lots of crawler User agent strings are pub- licly available [8], this information could greatly enhance our filtering efforts. An new Edurep component could be introduced, making it possible to block requests before they are processed by the system. However, at this point it is unclear if such an extra check on all requests outweighs the benefits of not having to process the blocked requests. For now, such a filtering component will have to implemented before the logs are processed by our business level reporting tool. While the ideas in this paper could be used in similar architectures, the actual scripts cannot because they are made for SRU and Edurep’s query log format. With more standardization in repository query languages (like SQI), correspond- ing logging standards can be thought of, making sure developed analysis tools benefit many and query logs can be shared easily. Filtering automatic queries is after all needed to look more closely at the human ones. The focus of interest is teacher search behaviour, not only on Edurep but beyond our borders. References 1. Aula, A.: Studying user strategies and characteristics for developing web search interfaces. Dissertations in Interactive Technology 3 (December 2005) 2. Dikaiakos, M.D., Stassopoulou, A., Papageorgiou, L.: An investigation of web crawler behavior: characterization and metrics. Computer Communications 28(8), 880–897 (2005) 28 3. Massart, D.: Towards a pan-european learning resource exchange infrastructure. Lecture Notes in Computer Science 5831/2009, 121–132 (2009) 4. Paulsson, F.: Connecting learning object repositories: Strategies, technologies and issues. Internet and Web Applications and Services, International Conference on 0, 583–589 (2009) 5. R Development Core Team: R: A Language and Environment for Statistical Com- puting. R Foundation for Statistical Computing, Vienna, Austria (2009) 6. robotstxt.org: The web robots page. Retrieved August, 3 2010, from http://www.robotstxt.org. (2007) 7. Simon, B., Massart, D., van Assche, F., Ternier, S., Duval, E., Brant- ner, S., Olmedilla, D., Miklós, Z.: A simple query interface for in- teroperable learning repositories. In: Proceedings of the 1st Workshop On Interoperability of Web-Based Educational Systems. pp. 11–18 (2005), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.67.7745 8. Staeding, A.: List of user-agents (spiders, robots, browser). Stichting Kennisnet. Edurep wiki. Retrieved August 5, 2010, from http://www.user-agents.org 9. Stassopoulou, A., Dikaiakos, M.: Web robot detection: A probabilistic reasoning approach. Computer Networks 53(3), 265–278 (February 2009) 10. Stichting Kennisnet: Edurep wiki. Retrieved June, 3 2010, from http://edurep.wiki.kennisnet.nl 11. Stichting Kennisnet ICT op School: De educatieve contentketen: leertech- nologische afspraken voor de toekomst. Retrieved May, 2 2007, from http://contentketen.kennisnet.nl/attachments/990312/De Educatieve contentketen - Leertechnologische afspraken voor de toekomst.pdf (December 2006) 12. Tan, P.N., Kumar, V.: Discovery of web robot sessions based on their navigational patterns. Data Mining and Knowledge Discovery 6(1), 9–35 (January 2002) 13. The Library of Congress: Sru: Search/retrieval via url. Stichting Kennisnet. Edurep wiki. Retrieved August 5, 2010, from http://www.loc.gov/standards/sru/., http://www.loc.gov/standards/sru/ 14. Wolpers, M., Memmel, M., Klerkx, J., Parra, G., Vandeputte, B., Duval, E., Schirru, R., Niemann, K.: Bridging repositories to form the mace experience. New Review of Information Networking 14(2), 102–116 (2008) 29 Handling Multiple Metadata Streams Regarding Digital Learning Material Jasper Roes1, Jeroen van Vuuren2, Nico Verbeij2 and Henk Nijstad3, 1 TNO Information- and Communication Technology, Colosseum 27, 7521 PV, Enschede 2 Verdonck, Klooster & Associates, Baron de Coubertinlaan 1, 2719 EN, Zoetermeer 3 Stichting Kennisnet, Paletsingel 32, 2718 NT, Zoetermeer Jasper.Roes@tno.nl, Jeroen.vanVuuren@vka.nl, Nico.Verbeij@vka.nl, H.Nijstad@kennisnet.nl Abstract. This paper presents the outcome of a study performed in the Nether- lands on handling multiple metadata streams regarding digital learning material. The paper describes the present metadata architecture in the Netherlands, the present suppliers and users of metadata and digital learning materials. It fur- thermore describes the roles of suppliers and users that were defined. Based on the needs of the present and new roles, and based on the present architecture the necessary changes to the metadata architecture are described to make it more future-proof and to ensure it can handle multiple metadata streams. Keywords: Metadata, architecture, learning materials, interoperability, reus- ability, edurep, eck, educational content chain 1 Introduction In the last couple of years, the use of metadata to be able to search and find appropri- ate learning materials has increased drastically. With this increase, more organizations are interested in producing digital learning materials. To be able to find appropriate learning materials, metadata records need to be provided. In the past, most digital learning materials were created by publishers, who also provided metadata records for their materials. Now that not only publishers are creating digital learning materials, but also schools, individuals and many more, the metadata records for one piece of learning material are not provided by one group anymore. These new groups of peo- ple and organizations providing metadata create new streams of metadata records. Metadata streams are metadata records flowing from the creator of the record, to the central database where the records are used to find learning materials. These new streams were not foreseen when developing the present metadata architecture1. In- stead of handling only one metadata stream, the educational content chain now has to handle many streams, all with different characteristics, and often providing additional metadata information about the same piece of learning material. This leads to difficul- 1 A definition of how the systems handling metadata interoperate. 30 ties in the educational content chain2, as the present architecture is based on the prin- ciple of only one metadata record for each piece of learning material. Based on these developments in the educational sector, TNO [1] and VKA [2] were asked by Kennisnet [3] to perform a study on this subject, and to provide Ken- nisnet with recommendations on changes to the present educational content chain making it able to handle multiple metadata streams and making the educational con- tent chain more future-oriented. Kennisnet is the foremost public knowledge centre regarding information and communication technologies (ICT) and education in the Netherlands. TNO is an ap- plied scientific research company that applies scientific knowledge with the aim of strengthening the innovative power of industry and government. Verdonck, Klooster & Associates (VKA) is an independent consultancy company with extensive experi- ence at the intersection of strategy, implementation and ICT in the public sector. 2 Present Educational Content Chain Constant developments in society, and the need of the Netherlands to extend their competitive knowledge industry, lead to a need for changes in the Dutch educational sector. A major development in the educational sector is the need to educate people during their entire life; education does not stop anymore after obtaining a degree. The digital era provides huge possibilities to support lifelong learning, and furthermore the present generation of students has high expectations about the digital possibilities in education. These developments lead to more flexible education, and therefore ask for new educational material: digital material. To ensure a good user experience, and to ensure interoperability and reusability of digital material there is a need for agreements. Kennisnet therefore developed an educational content chain to make agreements about the use of digital learning materials and to ensure interoperability, reusability and usability [4].The educational content chain in the Netherlands consists of five ma- jor steps: 1. Developing (creating digital learning material) 2. Making available (placing material on the web and providing metadata) 3. Finding (searching in the central database) 4. Arranging (creating new material based on existing materials) 5. Using (using the material during courses) These steps are often represented as a circle as the use of digital materials often leads to the development of new, or renewed, digital material. 2 The educational content chain consists of all users and suppliers that create, distribute, make available or use digital learning materials. This can for instance be publishers, schools, teachers, students, etc. 31 3 Present Content Chain Issues The present content chain has implicitly been developed with only one main stream of provisioning (authorative [5] ) metadata information: the stream that from the suppli- ers of the learning material to the central database where all metadata is collected and which the users can use question to find learning material. As long as this assumption is valid, for each piece of learning material only one metadata record is provided and users searching the database will only find each piece of learning material once. Ex- amples of these single (authorative [5] ) metadata-stream central search platforms are Edurep [6] in the Netherlands, the LRE [7] and the Spider project [8]. Recent developments though, show more and more sources providing complete metadata records, or supplying additional metadata to the original record. These new records lead to more than one metadata record in the central database for most of the learning materials. The availability of multiple metadata records for one piece of learning material in the present content chain makes it difficult for users to locate use- able learning materials as the search results show one piece of learning material mul- tiple times (once for each metadata record). Furthermore these multiple records also have another effect: users cannot search in the richest metadata, resulting in not finding useable learning materials. Table 1 shows two metadata records for the same piece of learning material. If a user searches for ‘Geography’ as course, he will find the material. If he searches for ‘Layer of the earth’ he will also find the material. When a user searches for the combination of both keywords the present content chain will not return the material as the combination of the key words are not available in one metadata record. Table 1. Multiple metadata records from different suppliers for one piece of learning material. Supplier Course Key word A Geography - B - Layer of the earth Based on the information available, the study [9] concluded that the two main is- sues of the present content chain are: 1. The present chain is incapable of handling multiple metadata streams 2. The present chain cannot link multiple metadata records that describe one piece of learning material 4 Roles of Users and Suppliers The study showed that it is difficult to the characterize parties (people or organiza- tions supplying learning materials and metadata). There is no such thing as a general publisher or a general user. Each party has different requirements and performs dif- ferent roles, as has been argumented in [10]. In this study six roles are defined [9]: 1. Suppliers: the expert creator/publisher of the learning material. 2. Prosumers: a consumer that also produces learning material. 32 3. Scouts: scouts locate learning material on the internet. 4. Enrichers: enrichers provide additional metadata for a specific target group. 5. Reviewers: reviewers review learning material for a specific target group 6. Users: a user that supplies metadata and/or reviews learning material for a spe- cific target group The needs of the roles are: suppliers, prosumers, and scouts want to provide meta- data to make the learning material findable. Enrichers, reviewers and users want to provide additional metadata to make the material more findable for specific target groups. 5 Desired Educational Content Chain Based on the defined roles and their needs, a new architecture for the educational con- tent chain was defined. In this architecture every role can provide the information they define or have present. Figure 2 presents the old and new architecture. The new architecture includes everything that is in the figure; the old architecture has the same structure, but only consists of the roles and records that are marked in a grey. On the left side of the figure the different roles are represented. The different roles are di- vided into three groups to give the users of the search engine the possibility to make a distinction while searching. Because of the distinction in three groups, users have the possibility to search only in metadata records provided by the source of the learning material. Each group has its own contact point at the central search platform (Edurep [6] in the Netherlands) to make the distinction described possible. On the right side of the figure the users that search for materials are represented. Source: Complete metadata records Suppliers Searching metadata records, Scouts distinction between three groups: Prosumers Source Source + Expert Expert Source + Expert + User Additional metadata information group: Enrichers Reviews Edurep Searching reviews, distinction between three groups: Reviewers Source Source + Expert Complete metadata records User: Source + Expert + User Additional metadata information Users Reviews Fig. 1. Old and new architecture of the Dutch educational content chain 6 Recommendations To be able to create the new architecture as described in section 4, it is necessary to make some changes to the present educational content chain. The most important rec- 33 ommendation that was defined is: “Ensure that each piece of learning material has a unique identifier, and ensure that the user can search in the richest metadata informa- tion (a combination of all available metadata information) available” [9]. By ensuring that each piece of learning material has a unique identifier, it is possi- ble to reference different metadata records that apply to the same piece of learning material. If these references can be made by the central database, it is possible to search in the richest metadata possible. It is furthermore possible to combine multiple pieces of metadata to create the best set of metadata for one piece of learning mate- rial. To ensure unique identifiers for each piece of learning material, agreements have to be made on how to generate unique identifiers. The suggestion made in the study is to use a formal URN (Uniform Resource Name) [11] as the unique identifier for the learning material. Kennisnet could apply for a formal URN to be used in the Nether- lands, and could supply each provider of learning materials with its own unique num- ber. The combination of the URN, the unique code for the provider and a unique number to be defined by the provider of the piece of learning material ensures that each piece of learning material has a unique identifier. Furthermore, two other recommendations have been defined based on the study: “Ensure that the educational content chain is managed” & “Provide a service to add additional metadata”. The first recommendation is important since the management of the content chain is currently not explicitly defined. The second recommendation is important to ensure that all roles defined have the possibility to provide metadata. Most roles already have some kind of tooling available, but f.e. for enhancers and re- viewers, no tooling is available. References 1. TNO, http://www.tno.nl 2. VKA, http://www.vka.nl 3. Kennisnet, http://www.kennisnet.nl 4. Kennisnet Educational Content Chain, http://contentketen.kennisnet.nl 5. Recker, M. & Wiley, D.A.: A non-authoritative educational metadata ontology for filtering and recommending learning objects. In: Journal of Interactive Learning Environments, pp. 255-271, Taylor & Francis, London (2001) 6. Edurep, http://edurep.kennisnet.nl 7. Massart, D.: Towards a pan-european learning resource exchange infrastructure. Lecture Notes in Computer Science 5831/2009, pp. 121-132, Springer-Verlag, Berlin (2009) 8. Paulsson, F.: Connecting Learning Object Repositories: Strategies, Technologies and Issues. In: 2009 Fourth International Conference on Internet and Web Applications and Services, pp. 583-589, IEEE Press, New York (2009) 9. Roes, J., Verbeij, N., van Vuuren, J.: Adviesrapportage metadatastromen. Kennisnet, Zoetermeer (2010) 10. Manouselis, N., Sampson, D.: Learning Resources Brokerage Systems: An Agent- Based Virtual Market Model. In: Third IEEE International Conference on Advanced Learn- ing Technologies, pp. 424, IEEE Press, New York (2003) 11. IANA URN namespaces, http://www.iana.org/assignments/urn-namespaces/ 34 Managing Broken URLs in Federated Metadata? Tien-Dung Le and Elena Shulman European Schoolnet, Rue de Trèves 61, B-1040 Brussels, Belgium. {dung.le,elena.shulman}@eun.org Abstract. In a large federation of learning object repositories, learning object locations or learning object URLs in metadata can become out of date when learning objects are moved or deleted. Regular checking of learning object URLs is essential. However, fully checking across a large federation imposes an unsustainable burden on time and resources while negatively impacting networks and repositories. This paper describes a broken URL handling system with a heuristic model that can provide a sustainable solution for federation service managers while enhancing communication and collaboration among federation stakeholders. Key words: Broken URLs, federated metadata, heuristic model, learn- ing resource exchange 1 Introduction The Learning Resource Exchange (LRE) is a service that allows European teach- ers to get access to digital educational content from many different countries and providers (Figure 1). Content providers produce metadata, i.e., machine-readable descriptions of the educational content they want to make available to teachers within and beyond their national learning object repository systems. The LRE provides unified access to Learning Objects (LOs) stored in these different repos- itories. Each content provider exposes their metadata so that it can be easily accessed by the LRE. The LRE collects metadata from the different content providers and compiles them to produce a digital catalog of learning resources that can be consulted by teachers using the LRE [4]. Along with other infor- mation relevant to pedagogical contexts, metadata contains the locations (i.e., Uniform Resource Locators - URLs [1]) where resources can be obtained. Typi- cally, users of the LRE catalog can obtain learning resource of potential interest by following the URL provided with the resource description. If this URL does not point to the expected resource, it is said to be “broken”. Although broken URLs can sometimes be the products of transient network or server problems, in most of the cases they are caused by outdated metadata. ? The work presented in this paper is partially supported by the European Commu- nity eContentplus programme - project ASPECT: Adopting Standards and Specifi- cations for Educational Content (Grant agreement number ECP-2007-EDU-417008). The authors are solely responsible for the content of this paper. It does not repre- sent the opinion of the European Community and the European Community is not responsible for any use that might be made of information contained therein. 35 Fig. 1. Learning Resource Exchange. This paper presents the results of our efforts to develop a solution that will effectively detect broken URLs, automate a system for communicating with con- tent providers and allow greater flexibility for LRE service managers in resolving issues that can negatively impact user experience with the LRE. Section 2 presents the background to the problem. Section 3 presents the ar- chitecture of the broken URLs handling system and details our proposed heuris- tic algorithm to automatically detect the broken URLs. Section 4 details the guidelines developed to facilitate effective and expeditious communication with content providers and procedures for LRE managers and system responses de- pending on the status of broken URLs once reported. Section 5 is a discussion including comments on some elements of comparison with related work. 2 Background Broken URLs are problematic on several levels for a service that manages a large federation of repositories. Unsurprisingly, broken URLs significantly im- pact user satisfaction with the system. Teachers have reported a loss of confi- dence in the LREs catalog and perceived quality of the LRE more broadly [6]. From the users point of view, broken URLs are also a source of frustration when teachers attempt to access resources they have previously placed in a favorites folder or when they recommend to colleagues resources that have become sud- denly unavailable. As depicted in Figure 1, the LRE federates LO repositories from various origins. LRE content is provided by ministries of education (MoE), commercial and non-profit content providers (Publisher), and cultural heritage organizations (Museums). Some of these repositories might store user-generated 36 content (Teachers). Because the LRE does not host the objects or control access to objects directly, the LRE must rely on the timely collaboration of content providers depicted in Figure 1 to update their metadata whenever object loca- tions have changed. Experience has shown that broken URLs can become a significant problem. Part of the solution we propose is based on a heuristic algorithm developed and tested for this purpose. This paper also illustrates the way we have linked a solution for broken URLs detection to support communication and collaboration with content providers. The LRE has a number of techniques in place intended to ameliorate the issue with metadata quality but none has provided a sustainable solution in a system reliant on coordinated actions in a growing federation. First, to keep the LRE catalog up-to-date, metadata is collected on regular basis (e.g., everyday). This technique ensures that updates to the metadata on the part of content providers are reflected in the main catalog. However, this requires content providers to maintain the accuracy of their own metadata. We have found that this technique has its own set of drawbacks because some providers do not use the metadata they produce for the LRE. Another technique relies on users’ scrutiny. Teachers who find broken URLs are invited to report them. Reported URLs are checked by the system and, if the problem is confirmed, the corresponding entries are removed from the catalog and their content providers receive a request to correct the problem. The main drawback of this technique comes from the fact that it is a highly individualized solution impacting the quality of a few records at a time. Moreover, this technique is fully reliant on the goodwill of the catalog users and does not ultimately shield them from unsat- isfactory search and retrieval experience culminating in the discovery of broken URLs. A third technique consists of systematically checking all the URLs of the catalog on a regular basis. Unfortunately, this solution does not scale. As the number of catalog entries grows, the time necessary to check the catalog fully can be prohibitively extensive in duration. For example, in the LRE a full check of more than 200,000 URLs takes more than two days. To make matters worse, systematically checking all the learning resource URLs available on a system can sometime be considered as “unfriendly” by their administrators. 3 Broken URL Handling System While allowing the LRE to efficiently detect and make decisions to remove cata- log entries with broken URLs from the search, the proposed solution is ultimately intended to support and automate more effective communication with content providers and to facilitate their collaboration in expeditiously updating their metadata. The handling system (Figure 2) and guidelines described in this arti- cle accomplish several objectives. They demonstrate the feasibility of a heuristic model in efficiently detecting broken URLs and propose a mechanism to trigger ameliorative actions from providers. They illustrate our efforts at establishing 37 clear guidelines to support decision making and communication with providers for LRE service managers responsible for metadata quality and user satisfac- tion. A component to detect and hide broken URLs is only part of the solution. The proposed solution also takes into account and avoids broken URL detection techniques that can appear to be denial of service attacks on partner systems. Fig. 2. Broken URLs Handling System. There are several steps in the detection and handling of broken URLs. The first stage was a full system check for broken URLs to determine the scale of the problem and to create a baseline to test the proposed algorithm. As we discussed, while this kind of check can produce useful data on URLs, it is not feasible to use this technique on a regular basis for reasons already mentioned. The second stage is our proposed long term solution to detect and trigger corrections for bad URLs based on a check of smaller subsets and provide better automated channels for communicating with content providers. Techniques for determining the size and location of the sets to be checked are described below. As mentioned above, it is necessary to put in place a heuristic checking algorithm to regularly check URLs of the catalog but not to “attack” the systems hosting LOs. Therefore only a subset of URLs should be checked at a time. Our analysis of the LO location distribution shows the URL selection should be based on the LO domain name system (or domain for short). For each domain, only a subset of its URLs should be checked. The first step is to determine which domains should be targeted for checks. To determine if the system stops or continues checking URLs in a domain, a sampling plan is applied after all selected URLs are checked. 38 3.1 Learning Object Location Distribution Learning objects in the LRE federation are hosted in one or several domains. Typically, a URL or a LO location refers only to one domain while a domain normally hosts more than one LO. We analyzed the relationship between LOs and domains based on 2 aspects. First we looked at the domain distribution in the catalog, which indicates how many domains have a certain number of LOs. Second we analyzed the distribution of LOs to determine how LOs are distributed among domains. In June 2010, there were 842 domains in the LRE. Figure 3 shows the do- main distribution and Figure 4 shows the LO distribution. A significantly large percentage of domains (60%) hosted just a single learning object. Even more important for the development of our solution was the finding that only 4% of domains hosted more than 1000 LOs and 70% of LOs are located on domains presently hosting over 10,000 LOs. The fact that a small number of domains are hosting 96% of all LOs was instrumental in building a heuristic checking algorithm. Fig. 3. Domain distribution in the LRE. 3.2 Heuristic Checking Algorithm Based on the analysis of LO distributions, we propose to check each domain separately. For each domain, two checking steps are applied. First, the system checks all reported-broken URLs, which are reported as broken URLs in the last check, and calculates the confidence of the system on the domain. The second step in the check applies a sampling plan to check all previously good URLs. The sampling plan (Figure 5) is a simplified solution of the sequential sampling plan [7]. 39 Fig. 4. LO location distribution in the LRE. Sampling Plan – Single sampling plan One sample of URLs on a domain, such as 10% of URLs, is selected at random. In case there are more than p2 of selected URLs, such as 90%, are good, all unchecked URLs are considered as good URLs. Otherwise, the whole URLs on the domain are checked. – Sequential sampling plan A set of URLs in a domain, for instance a set of 100 URLs, is selected at a time and after inspection a decision is made to accept (stop checking) or reject (check all URLs) in the domain or select another set in the same domain (continue sampling). In general, this sequential sampling allows for quick decisions, especially when frequency of broken URLs in a domain is particularly high or particularly low. – Sampling plan in the algorithm Because we recognize the difficulties in calculating the slope of accept and reject lines, the checking algorithm uses sequential sampling plan with a simplification - the slope is zero (Figure 5). Sampling Plan’s Parameters – Pre-defined threshold p1 presents the low confidence of the checking system on the domain. p = p1 is the reject line in the sampling plan. – Two pre-defined thresholds p− + 2 and p2 present the low-value confidence and the high-value confidence of the checking system on the domain . 40 Fig. 5. Sampling plan. – Npb is the number of broken URLs in the previous check, Npsb is the number of stay-broken URLs in the current check, the rate r is calculated as follow r = Npsb /Npb (1) – A threshold p2 presents the confidence of the checking system on the domain. p2 is calculated as follow p2 = p− + − 2 + r ∗ (p2 − p2 ) (2) p = p2 is the accept line in the sampling plan. Sampling Plan Procedure – A group G of URLs is selected at a time for checking. Assume that Nc is the total number of checked URLs in the current check, Ng is the total number of good URLs, p is the percentage of good URLs calculated by p = Ng /Nc (3) – If p < p1 then the system checks all URLs of the set. If p2 ≤ p then the system assumes that all the other URLs on the domain are good and stops checking. Otherwise, the system continues sampling and checking (Figure 5). Algorithm 1 shows the detail of the checking technique. 41 Algorithm 1 Heuristic checking algorithm 1: procedure Check 2: for all domain ∈ domains do 3: Re-check all broken URLs on domain 4: Calculate p2 using equation 2 5: sampling = true 6: while (sampling and (domain has unchecked URLs)) do 7: Take a group of URLs for checking 8: Check this group 9: Update p using equation 3 10: if p < p1 then 11: Check all other URLs 12: sampling = false 13: else 14: if p2 ≤ p then 15: sampling = false . assume that all other URLs are good 16: end if 17: end if 18: end while 19: end for 20: end procedure 3.3 Experiment To compare our algorithm with the full system check, we first executed a full system check to detect all broken URLs in the LRE, then we executed the heuristic checking algorithm 3 times in simulation mode with the findings from the full system checking results for comparison. In June 2010, the total number of URLs in the LRE was 236763, among them 45711 (19,31%) URLs were broken1 . The first run was executed with an assumption that all URLs are good, while the second run and the third run used knowledge from the previous run. The following parameters were used: G = 100, p1 = 50%, p− + 2 = 90%, p2 = 95%. The first run only checked 17,36% of URLs but was able to detect 73,48% of broken URLs. The second and the third checks detected 94,92% and 98,67% of broken URLs. Overall, after 3 runs, the heuristic check algorithm only checked 36,78% of all URLs on average but detected 98,67% of broken URLs. 4 Procedures to Correct Broken URLs Given that the LRE federates LO repositories from various origins, effective han- dling broken URLs depends simultaneously on the heuristic checking algorithm that can be deployed to test discrete domains and automation of communication channels with content providers. Therefore, we established a schedule for check 1 The experiment took place on June 23rd, 2010. 42 Table 1. Results Run Selected URLs Broken URLs Rate (in number and (in number and (number of in percentage of in percentage of broken URLs the total number the total number /number of of URLs) of broken URLs) selected URLs) 1 41102 = 17,36% 33587 = 73,48% 81,72% 2 105126 = 44,40% 43388 = 94,92% 41,27% 3 115005 = 48,57% 45101 = 98,67% 39,22% Overall 710289 = 36,78% 45101 = 98,67% 54,07% (in average) (max) (in average) sequences and conditions under which LRE service managers could act on the findings of the checking algorithm, initiate the removal of records from search results, set in motion further checks and determine when it was appropriate to return records to the search. Making these conditions explicit allows for better management of metadata quality in the LRE, shields users from negative expe- riences during search and retrieval and opens opportunities to collaborate with content providers in improving the quality of their own metadata. The first step in this process begins when a broken link is first found and the LRE system generates an automated notification for providers. System controller coordinates all other components and generates reports. These machine readable reports are set to trigger a fix and can be exposed to providers using RSS, ATOM feeds or downloadable excel sheets depending on the preferences of the providers. After seven days, the LRE system is set to recheck the problem URLs. If the problem persists and there has been no other response from the provider, an automated email message is generated for the LRE system manager notifying them of URL’s status, providing relevant details about the record(s) in question and contact information for the provider. Using this data, the LRE manager sends a personal email to the repository manager. The repository manager will be notified that the problem has not been corrected despite an automated report. The guidelines stipulate that after three weeks records with broken URLs will be removed from the search results. At this point we hope to begin a dialog with the repository manager, determine the source of the problem, offer advice or other assistance if needed and provide feedback more generally to repository administrators. In order to mitigate the detrimental effects of broken URL on user experience, we have instituted a time limit for how long records with broken URLs having been identified and reported to providers can remain in the LRE search. One month after the initial discovery of the broken URL, the system will initiate another check of the domain using the algorithm described above. If the problem has not been corrected, records with broken URLs will be removed from search results. This action is accompanied by an automated notification sent to the provider listing items that are no longer in the search results and the reason for their removal. Once records have been removed from the search, a 43 Fibonacci sequence is implemented to determine if and when the problem URLs are corrected by providers. When the check sequence indicates that the broken URL has been corrected, the records will be returned to the search results. An automated notification to providers, both machine readable and as an email, will be generated to alert the providers of their record(s)’ change in status. This ability to return the records to the search takes into account scenarios in which providers are not able to respond expeditiously to reports of broken URLs without unduly impacting the quality of LREs services. The guidelines described above take into account the need to provide flexi- bility of responses for LRE service managers and opportunities to improve col- laboration between system administrators within the federation. 5 Discussion and Related Works Although, recognition of the problems and attempts to grapple with the chal- lenges of broken URLs or “link rot” are not new, most of existing approaches to automatically detect broken URLs are based on the relationship between re- sources or pages [2],[8], [12] which is not applicable in a LO federation where there is almost no link between two learning objects locations. There are also some methods such as [9], [11], [3] to assure that the links are always available. These methods could be divided into two categories. The first approach involves creating copies and keeping resources in a local repository [9], [11]. This approach assumes that the LOs will not be modified or requires a check of the links on the fly to determine if and when the system needs to switch the links. The second approach uses a persistent identifier service to resolve the URLs [3]. However, this approach only works well if such service is available for content providers sites, which is not the case in the LRE. The proposed mechanism to manage broken URLs allows LRE managers to control the quality of LO metadata in a federation. It combines different ways to detect broken URLs and report problems to the content providers. Because the heuristic checking algorithm is able to detect broken URLs without blocking network communication it can be implemented to run in a sequence and schedule intended to provide detection, reports and for corrective actions either on the part of providers or, if necessary, on the part of LRE managers. 6 Conclusion In this paper, we have presented a framework with a heuristic checking algorithm that allows for the correction of broken URLs in federated metadata. Future work will focus on further enhancements of this framework to detect broken URLs when metadata arrives in the LRE. This future work will involve the development of a filter to remove broken URLs from metadata collected via the Open Archives Initiative Protocol for Metadata Harvesting OAI-PMH [5] or the Simple Publishing Interface SPI [10]. 44 References 1. Berners-Lee, T., Masinter, L., McCahill, M.: Uniform Resource Locators (URL) (RFC 1738). Network Working Group (1994) 2. Ingham, D., Caughey, S., Little, M.: Fixing the “Broken-Link” Problem: The W3Objects Approach. In: Computer Networks and ISDN Systems, 28, pp. 1255- 1268 (1996) 3. Kahn, R., Wilensky, R.: A Framework for Distributed Digital Object Services. In: International Journal on Digital Libraries, 6, pp.115-123 (2006) 4. Massart, D.: Towards a Pan-European Learning Resource Exchange Infrastructure. In: Feldman, Y., Kraft, D., Kuflik, T. (eds.) NGITS’2009, LNCS, vol. 5831, pp. 121-132. Springer, Haifa, Israel (2009) 5. Lagoze, C., Sompel, H. V.: The Open Archives Initiative Protocol for Metadata Harvesting http://www.openarchives.org/OAI/openarchivesprotocol.htm (2001) 6. MELT: Final Evaluation Report, D 7.3. http://info.melt- project.eu/shared/data/melt/MELT D7 3 Final Evaluation Report.pdf (2009) 7. NIST/SEMATECH: e-Handbook of Statistical Methods. http://www.itl.nist.gov/div898/handbook (2010) 8. Popitsch, N., Haslhofer, B.: DSNotify: Handling Broken Links in the Web of Data. In: Proceedings of the 19th international WWW conference (WWW2010), NC, USA (2010) 9. Reich, V., Rosenthal, D.: LOCKSS: A Permanent Web Publishing and Access System. D-Lib Magazine, vol. 7 (2001) 10. Ternier, S.: Interim Report: The Simple Publishing Interface Specification. http://ariadne.cs.kuleuven.be/lomi/images/b/ba/CEN SPI interim report.pdf (2009) 11. Veiga, L., Ferreira, P.: RepWeb: Replicated Web With Referential Integrity. In: Proceedings of the 2003 ACM Symposium on Applied Computing, pp. 1206-1211, New York, NY, USA (2003) 12. Wiley, G., Thomas, W.: Improving OpenURL Metadata. Serials Librarian, vol. 56, pp. 282-286 (2009) 45 Preliminary Discussion on a Digital Curation Framework for Learning Repositories Nikos Palavitsinis1;2, Nikos Manouselis1, Salvador Sanchez-Alonso2 1 Greek Research & Technology Network (GRNET), Greece {palavitsinis; nikosm}@grnet.gr; 2 University of Alcala de Henares (UAH), Spain {palavitsinis; salvador.sanchez}@uah.es Abstract. Learning Object Repositories have met significant development during the last few years. Researchers have extensively discussed the concept of learning objects and their accompanying metadata. Metadata in specific, were introduced and developed by the community of librarians for cataloguing purposes. From the same background, digital curation has emerged as a field of research directly linked to the needs of preserving large datasets over time and platforms. This paper links digital curation with learning objects and mostly discusses the process of digital curation whereas at the same time, attempts to identify possible research directions for digital curation in LORs. Keywords: learning resources, curation, metadata, lifecycle 1 Introduction Learning Object Repositories (LORs) are databases used for storing and/or enabling the interoperability of Learning Objects (LOs) as defined by McGreal [13]. Because not all repositories store the actual object files, a key function of repositories is to identify the storage location of the objects and provide an indexing system that enables the efficient search and discovery of the objects [16]. A growing body of learning repositories is making digital learning resources available to the user searching for educational content on various topics, through learning repositories (i.e. MERLOT, MIT’s OpenCourseWare, ARIADNE, LRE for schools, Organic.Edunet, MACE Project). This vast amount of objects calls for specific actions to maintain them over their lifecycle and make them available for current and future generations [2]. Addressing this issue, the term “Digital Curation”, which implies a transfer of existing curatorial approaches from analogue resources to their digital counterparts, was introduced at the “Digital Curation: digital archives, libraries and e-science seminar” in 2001. 46 Beagrie [2] defines “Curation” as the actions needed to maintain digital research data and other digital materials over their entire life-cycle and over time for current and future generations of users. Pennock [14] quotes that Digital Curation is about maintaining and adding value to a trusted body of digital information for both current and future use, adding the aspect of added value in the process of Digital Curation. In a comprehensive paper, Campbell [3] identified some of the issues that need to be taken into account when considering the Digital Curation of both learning objects and the metadata that describes them. Specifically, Campbell concluded by pointing out that it is certain that the use of digital objects to facilitate teaching and learning will continue to grow and that the metadata requirements of the communities of practice that use these resources will become increasingly complex. Building on this notion, this paper emphasizes on the importance of Digital Curation on Learning Object Repositories (LORs) and suggests the use of the DCC Curation Lifecycle Model [10] in Learning Repositories. To this direction, the first section provides definitions to set a common level of understanding on the basic concepts used in this paper while the second section presents in detail the Digital Curation process through existing work and discusses the notion of applying such techniques to Learning Object Repositories. Finally, the last section identifies limitations of the specific research and suggests future directions of research. 2 Background The term Digital Curation implies not only the preservation of digital resources or the maintenance of a collection to keep it accessible but it also includes some degree of added value and knowledge [2][8]. Curation actions can be carried out on a broad range of scientific data and resources in multiple disciplines, ranging from arts & humanities to life sciences, physical sciences, medical sciences etc [1] to all stages of the digital resources’ lifecycle [10] In general, digital learning resources are significantly different when compared to digital datasets that are used in sciences such as physics, astronomy, biology, etc. This difference also stems from the specific nature of learning object metadata as IEEE [11] states these should take into account “the diversity of cultural and lingual contexts in which the learning objects and their metadata will be exploited”. Supporting this, McGreal [12] stated that LOs can be defined as any reusable digital resource that is encapsulated in a lesson or assemblage of lessons grouped in units, modules, courses, and even programmes. Polsani [15] defined reusable learning objects (RLO) as independent and self-standing units of learning content predisposed to reuse in multiple instructional contexts. These definitions emphasize on the educational uses of learning resources, already providing some evidence as regards their unique characteristics. 47 3 Digital Curation Framework for Learning Object Repositories The DCC (Digital Curation Centre of the University of Edinburgh) Curation Lifecycle model (Fig. 1) is an existing curation model that can be used for curation actions as it is generic enough to be applied to different contexts and serve different communities. Additionally, the authors feel that the generic nature of DCC Curation Lifecycle Model can serve as a basis for elaborating on some initial thoughts on curation issues in LORs, providing the ability to apply them to similar models dealing with digital curation. The DCC Curation Lifecycle Model (DCC-CLM), provides a graphical high-level overview of the stages required for successful curation and preservation of data from initial conceptualization or receipt. The authors feel that each stage of the DCC-CLM should be carefully examined, always taking into account the specificity of LORs and LOs to identify challenges and issues that may arise for digital curation in the case of LORs. Examining the model in such a way, could possibly indicate whether or not curation actions will be needed for educational resources in contrast to the scientific data already being largely curated. Fig. 1. DCC Curation Lifecycle Model as defined by Higgins (2008) Lastly, even if, the model presented may not be one hundred percent applicable to the case of learning resources and learning repositories, but may as well present specific challenges for applying parts of the model in LORs. An important research question has to do with the degree to which existing metadata standards and specifications, 48 store preservation data for the learning resources. Because if specific preservation actions are proven to be relevant for learning resources as well, then for sure, existing standards should be capable of storing such data. This and similar issues will be further analyzed in future, more extensive studies. 4 Conclusions and Limitations of the Study The present paper opened a discussion on whether or not Digital Curation can take place in the context of LORs. Overall, this paper attempted to build upon relevant studies on digital curation issues for educational metadata. By doing so, the authors attempt to open a discussion on whether or not, digital curation or some key processes it involves are relevant for Learning Object Repositories. The first limitation of this paper lies within Digital Curation itself, as it is yet an emerging field with many different contributions from a great number of scientists that make it even more difficult to define concepts and theories. Another important limitation of this paper lies in its theoretical nature. This fact is mainly attributed to the need for an initial discussion, even on a theoretical level, on some potential research directions which will be documented in follow-up papers and examined in depth through case studies on existing LORs. Future research that will extend the initial findings of this paper will focus on specific steps of the DCC Curation Lifecycle Model, reviewing existing literature from the curation experts as well as the learning repository ones, trying to also quantitatively prove that digital curation is significant as a context-specific curation. Once this is proven, future directions will include composing a DCC Curation Lifecycle Model for Learning Object Repositories. Acknowledgements The work presented in this paper has been funded with support by the European Commission, and more specifically the project ECP-2006-EDU-410012 “Organic.Edunet: A Multilingual Federation of Learning Repositories with Quality Content for the Awareness and Education of European Youth about Organic Agriculture and Agroecology” of the eContentplus Programme. References 1. Ball, A.: Review of the State of the Art of the Digital Curation of Research Data. Project Report. Bath: University of Bath, (ERIM Project Document erim1rep091103ab12). Retrieved on June 20, 2010 from: http://opus.bath.ac.uk/19022/2/erim1rep091103ab12.pdf (2010) 2. Beagrie, N.: Digital Curation for science, digital libraries, and individuals.International Journal of Digital Curation, 1(1). Retrieved June 22, 2010, from: http://ijdc.net/index.php/ijdc/article/viewFile/6/2 (2006) 49 3. Campbell, L.: Learning Object Metadata (LOM), DCC Digital Curation Manual, S.Ross, M.Day (eds), Retrieved June 20, 2010, from http://www.dcc.ac.uk/resource/ curation-manual/chapters/learning-object-metadata (2007) 4. Carrier, S.: The Dryad Repository Application Profile: Process, Development, and Refinement, DOI: http://hdl.handle.net/1901/534 (2008) 5. Currier, S., Barton, J., O’Beirne, R., & Ryan, B.: Quality assurance for digital learning object repositories: issues for the metadata creation process. ALT-J, 12(1), pp. 5-20 (2004) 6. Chen, Y., Chen, S., & Lin, S.: A metadata lifecycle model for digital libraries:methodology and application for an evidence-based approach to libraryresearch. In Documents in Information Science, Working Papers Series.Retrieved June 18, 2008, from http://www.ifla.org/IV/ifla69/papers/141e- Chen_Chen_Lin.pdf (2003) 7. Collis, B. & Strijker, A.: Technology and human issues in reusing learning objects, Journal of Interactive Media in Education, vol. 4, pp. 1–32, (2004) 8. Conyers, A., Dalton, P.: Preservation of e-Learning Materials: an attitudinal study, Birmingham City University. Retrieved on June 23 from: http://www.jisc.ac.uk/media/documents/programmes/preservation/elopres.pdf (2008) 9. Greenberg, J.: Theoretical Considerations of Lifecycle Modeling: An Analysis of the Dryad Repository Demonstrating Automatic Metadata Propagation, Inheritance, and Value System Adoption, Cataloging & Classification Quarterly, 47(3), pp. 380 – 402 (2009) 10. Higgins, S.: The DCC Curation Lifecycle Model, International Journal of Digital Curation, 3(1), http://www.ijdc.net/index.php/ijdc/article/view/69/69 (2008) 11. IEEE: Draft Standard for Learning Object Metadata. Retrieved on June 22, 2010 from http://ltsc.ieee.org/wg12/index.html (2001) 12. McGreal, R.,: Learning objects: A practical definition. Retrieved June 15, 2010 from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.118.9253&rep=rep1&type =pdf#page=25 (2004) 13. McGreal, R.: A typology of learning object repositories, [pre-print]. Retrieved June 22, 2010 from http://hdl.handle.net/2149/1078 (2007) 14. Pennock, M.: Digital Curation: a life-cycle approach to managing and preserving usable digital information. Library and Archives Journal, 1. Retrieved: June 22, 2010, at http://www.ukoln.ac.uk/ukoln/staff/m.pennock/publications/docs/lib- rch_curation.pdf (2007) 15. Polsani, P.R.: Use and Abuse of Reusable Learning Objects. Journal of Digital Information, 3(4) (2003) 16. Richards, G., McGreal, R., Hatala, M., & Friesen, N.: Learning object repository technologies: Portals for on-line objects for learning. Journal of Distance Education, 17(3), pp. 67-79 (2002) 50 Automatic Keywords Extraction – a Basis for Content Recommendation Ivana Bosnić1, Katrien Verbert2, Erik Duval2 1 Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000 Zagreb, Croatia 2 Dept. Computerwetenschappen, Katholieke Universiteit Leuven, Celestijnenlaan 200A, B-3001 Leuven, Belgium ivana.bosnic@fer.hr, {katrien.verbert, erik.duval}@cs.kuleuven.be Abstract. This paper describes a use case for an application that recommends learning objects for reuse and is integrated in the authoring environment. The recommendations are based on the automatic detection of content being authored and the context in which this resource is authored or used. The focus of the paper is automatic keyword extraction, evaluated as a starting point for content analysis. The evaluations explore whether automatic keyword extraction from content being authored is a sound basis for recommending relevant learning objects. The results show that automatically extracted keywords are suitable for this purpose, if some observed issues are appropriately addressed. Keywords: content, reuse, recommendations, keywords, keyword extraction 1 Introduction Content reuse today – although somewhat increased by new technologies and interfaces to aggregate and remix the content – is still not straightforward for mainstream authors of educational content. Barriers limiting content reuse include the immaturity or absence of support for discovering and reusing learning content in authoring tools and difficulties associated with combining and referencing reused learning materials [1]. The goal of our research is to analyze the reuse potential of learning objects and to support their discovery, recommendation and reuse within available authoring tools. Recommendation is based on both the content being authored and the context in which the content is authored or used. This paper analyzes whether the results of automatic keyword extraction from the content being authored can be a basis for recommending resources relevant to the author. These keywords are generated based on both the on-the-fly analysis of content the author is editing, and context data that is available in an authoring or learning environment. Our research, presented in this paper, focuses primarily on the results of keyword extraction analysis, and on describing the process of content reuse which is based on this topic analysis and integrated in the authoring environments. The paper is organized as follows: The application use case is presented in section 2. Automatic keyword extraction services are presented in section 3. Section 4 describes the comparison between two keyword generation services, while section 5 51 describes the keyword evaluations in the application prototype. The paper wraps up with conclusions and future work in section 6. 2 Application Use Case The application purpose is to help authors of educational content, by: • recommending relevant content during authoring, without manual searching by the author; • enabling easier content reuse and remix, particularly of small fragments, by referencing or using advanced copy-paste functionalities; • integrating these functionalities in the authoring or learning environments through extensions of applications such as wikis, blogs, or presentation software. One of the application use cases can be described with the following steps: 1. The user authors the content in his authoring environment (e.g. Wiki); 2. The application collects the content being authored, together with context data available (e.g. age range, difficulty level) and proposes the recommendations; 3. The user views the recommendations to decide whether they are relevant to him; 4. If the content is useful for either copying partly or just for getting ideas, then the user chooses to reference this content. The reference is automatically inserted in the content being authored, in the appropriate format (e.g. WikiMarkup, HTML markup or plain text); 5. As the user continues to edit the content, the changes are incorporated and new recommendations are presented. In order to discover the resources, the application, integrated in the authoring environment, analyzes the content being authored. An automatic keyword extraction service extracts keywords from the text. Additional context is obtained from the authoring or learning environment (the purpose of the course, the preferred format of resources to be reused, etc.). Together with the keywords, this context data is used to search and retrieve relevant resources from content providers, including large learning object repository networks and social bookmarking websites. 3 Keyword-Based Content Discovery The usual way of querying content providers is by using keywords as search terms. In the case of repositories containing learning object metadata, search terms can be used to query fields such as title, description or keywords and further refined by using additional metadata fields that capture the context in which the learning content is used. In this section, automatic keyword extraction services that can be used as a basis for generating search terms are presented. Keyword extraction services can be divided in two groups, based on the usage of algorithms for constructing the semantic context: • term extraction services – this group of services extracts the keywords from a text. Examples include Yahoo Term Extraction Web Service [2] and Fivefilters [3]. 52 • semantic entity extraction services - this group of services not only extracts the keywords, but also detects the concepts related to the text, which are not present in the text itself. These services often have semantic linking features, i.e. they include additional encyclopedia links, images, articles, etc. Examples of such services are Zemanta [4], OpenCalais [5], Evri [6] and AlchemyAPI [7]. Most services provide interfaces for online use, mainly REST or SOAP. The usual result outputs are represented in RDF, XML, JSON or plain text. The services mostly use keyword classification schemes, such as the DBpedia ontology [8], Wordnet [9] or dmoz Open Directory Project [10]. Some services have their own entity databases. Several comparisons of keyword extractors and semantic APIs exist. Zemanta and OpenCalais are recommended in [11], AlchemyAPI and Evri in [12], while [13] focuses on the characteristics of services for semantic tagging, without specific recommendations. Services from both groups were evaluated to compare and contrast their efficiency and potential use within our application: • Yahoo Term Extraction Web service (Yahoo in the following text) is a popular keyword extractor with a RESTful interface, which returns up to 20 keywords that are found in the text. The keywords are not ranked internally. This service is successfully used in automatic metadata generation frameworks like SAmgI [14]. As SAmgI generates metadata for a subset of objects in the GLOBE network of repositories [15] that is used in our research, this was an additional reason to evaluate it for our purpose. • Zemanta is a semantic entity extraction service with both RESTful and JavaScript interface. It returns up to 8 ranked keywords. Additionally, it recommends images, links to ~20 Web sites (Wikipedia, Youtube, IMDB, etc.) and blog/news articles from ~10000 sites. Optionally, Zemanta provides the keywords according to the dmoz keyword classification. Moreover, its extraction process can be influenced by emphasizing selected words. The following section describes the comparison of these two services and the evaluation of their potential for automatic content discovery. In this evaluation, Zemanta and Yahoo were used to extract the keywords from several already existing presentations. These keywords were graded by users. In addition, the users were asked to manually provide keywords for the presentations and the keywords extracted by Yahoo and Zemanta were compared with these, user-generated keywords. 4 Evaluation of Keyword Extraction Services 4.1 Evaluation Methodology The goals of this evaluation were to test the keyword extraction services with the examples of existing educational content, to compare the keywords extracted by Zemanta and Yahoo, and also to compare those to the user-generated keywords. In the evaluation, 9 presentations were used – 3 for each topic (open source, databases and gravity force), different in their characteristics, which is expected to influence the quality of extracted keywords. A topic of open source mostly uses 53 general words, descriptions and a smaller number of specific terms; a topic of databases is a more specific one, while an explanation of a gravity force contains formulas and lots of specific physics-related terms. The presentations were gathered from Google’s first page result on queries for “what is open source”, “what is database” and “what is gravity”, with file type filtering for Microsoft PowerPoint presentations. The excerpts chosen were text-only contents of 3 adjacent slides of each presentation, to better describe the context. Some slides had examples from other fields to help illustrate the concepts. Some texts were written as sentences, while others had only a few words per bullet. An assumption is made that the extraction services will have less success with shorter texts, partial sentences and the examples from different fields. However, these are often found in presentations, thus it should be tested whether keyword extraction gives satisfying results in those cases, too. Six users were involved in the evaluation, which consisted of two parts: 1. The users were asked to read 9 text excerpts, and write the queries which they would use in search engines. They could type as many queries as they wanted. 2. For each of the 9 presentations, the users were presented with 8 keywords from Zemanta and the first 10 keywords from Yahoo. They were asked to grade the relevancy of each keyword, which, of course, could consist of one or more words. 4.2 Automatically Extracted Keywords Two keyword extraction services were compared by the following criteria: User keyword relevancy grading. Fig. 1 shows the average of relevancy grades per presentation. Zemanta is graded higher in 7 of 9 presentations. If the same average is calculated for 3 presentation topics, it shows that the keywords from both services are graded higher as the topic specificity increases (Fig. 2). In all three topics, users have graded the keywords from Zemanta higher. 5 5 4 4 3 3 Zemanta Zemanta Yahoo 2 Yahoo 2 1 0 1 1 2 3 1 2 3 4 5 6 7 8 9 Fig. 1. The average of keyword relevancy Fig. 2. The average of keywords relevancy grading per presentation. For each of the 9 grading (Y-axis) per presentation topic (1 – presentations (X-axis), the users were grading open source, 2 – databases, 3 – gravity) on the relevancy of 8 keywords from Zemanta the X-axis. and 10 keywords from Yahoo, with grades 1-5 (5 being the most relevant). The average of grades is calculated for two services separately (Y-axis). The grades for the same keywords were equally distributed among users. 54 Fig. 3 shows the average of user grading for the keywords for each of the 8 Zemanta ranks. In general, the grading tends to drop as Zemanta ranking lowers, which justifies the decision to make queries by combining the highest Zemanta ranked keywords. Yahoo provides the keywords in order of appearance in the text, without any ranking mechanism, so this service could not be evaluated in this way. 5 4 3 2 1 0 1 2 3 4 5 6 7 8 Fig. 3. The average user grading of keywords per particular Zemanta rank. The X-axis presents 8 Zemanta internal ranks. The Y-axis presents the average of user grades for the keywords in each Zemanta rank. In this diagram, the keywords from all 9 presentations were included. 4.3 User-Generated Keywords To see how different the user keywords are from automatically extracted ones, the comparison of these two sets was made. This comparison is used to analyze how different are the results provided by keyword generation services from the user- proposed search queries - keywords. Only the keywords shared by at least two users were included, to provide more comprehensive and relevant results. Two comparisons were made: • exact match – checking whether the exact user-generated keyword was included in the list of extracted keywords. The difference in singular/plural form of nouns was counted as exact match, as most indexing services used can internally match these. • similar match – checking whether a similar user-generated keyword was in the list of automatically extracted ones. The keywords as subsets of other keywords are considered similar (e.g. keyword “open source” is similar to “open source definition”), as well as the ones which could be matched with grammatical or syntax changes (e.g. keyword „gravity law“ is similar to „law of gravity“). Fig. 4 shows the number of common user-generated keywords and the number of matches with automatically-generated keywords. The results show that the more important keywords – the ones which are common to more users – have a higher match rate. This is especially visible if similar matches are considered, which is an argument for use of advanced methods to find the keywords similar to automatically generated ones. 18 common keywords 16 14 Zemanta - exact 12 Zemanta -similar 10 Yahoo - exact 8 Yahoo - similar 6 4 2 0 2 3 4 5 6 55 Fig. 4. The number of exact and similar matches between user-generated and automatically extracted keywords, in comparison to common keywords – the ones proposed by more than 2 users (Y-axis). The keywords are distributed by the number of users which proposed this keyword, shown in X-axis. With the exact match, Zemanta matches more words than Yahoo in 2 sets and in 3 is equal to Yahoo. With similar match, Yahoo matches more words than Zemanta in 2 sets, less words in 1 set and in 2 is equal to Zemanta. The following section describes the initial keyword evaluations carried out in the application prototype environment, where the keywords had to be extracted during the presentation authoring. This approach poses additional challenges in text preparation and automatic keywords extraction, which are described in the following text. In these evaluations, the Zemanta extraction service was used. 5 Keyword Evaluations in the Authoring Environment 5.1 Evaluation Methodology Two keyword evaluations were carried out. The overall goal of these evaluations was to determine whether automatic keyword extraction from content being authored is a sound basis for recommending relevant learning objects to the author. More specifically, the relevancy and ranking of the extracted keywords were evaluated. The evaluations were done as a part of an overall evaluation according to the discount usability engineering principles [16]. Therefore, it should be noted that these are not the results of thorough evaluations, rather of basic, initial user tests. The users were asked to create an informative presentation about a programming topic familiar to them. The time was limited to 15 minutes. Specifically, the users were given an empty presentation template in the MediaWiki service, enhanced by the WikiPres extension – a MediaWiki plugin for collaborative presentation authoring using WikiMarkup [17]. They were advised to make use of the recommendation application, and to properly attribute reused resources. Once the presentation was finished, the users chose one of the more content rich slides they authored (not the title or introduction slide). They were presented with 8 keywords generated for that slide and asked to rank the 5 keywords they considered the most relevant. Fig. 5 presents the relation of the user ranking and Zemanta ranking. Fig. 6 shows the averages of user rankings for keywords in the same Zemanta rank. 5.2 Evaluation 1 Four users ranked the keywords extracted and ranked by Zemanta. Of course, the generated keywords were different for each user: the user ranking is compared with that of Zemanta. 56 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 0 1 1 2 User1 2 User2 3 User3 3 User4 4 4 5 5 6 Fig. 5. The relation between the user and Fig. 6. The average user ranking. The X-axis Zemanta ranking. The X-axis presents presents Zemanta internal ranks. The Y-axis Zemanta ranks, from 1-8 (1 being the presents the average of user rankings for all highest-ranked). The Y-axis presents user keywords in a particular Zemanta rank. For ranks from 1-5 (1 being the highest-ranked). instance, the highest-ranked keywords by The ranking itself is marked with a dot of a Zemanta got 1, 1, 1 and 2 as user ranks, different type for each user. Ideally, the user which gives an average of 1.25 out of 5. The and internal rankings would be identical, diagram shows that the user ranking lowers with all the dots on a diagonal line. Here, the together with Zemanta ranking; the dots are dispersed, but still near the diagonal keywords with the lowest Zemanta rankings line. The majority of dots are placed in the are not among the most relevant to the users. first five columns (Zemanta rank 1-5): this For this calculation, the keywords not being shows that users and Zemanta largely agree among the 5 most relevant were given the on what are the 5 most relevant keywords. rank 6. Lessons learned. The interpretation of evaluation results shows that users mostly agree with Zemanta ranking, which is important for our purpose. Looking into the example of extracted keywords, it can be seen that there are also some irrelevant keywords. In addition, during the evaluation, the following issues were observed: • Content cold start. At the beginning of authoring, a number of words should be present for satisfactory results. Otherwise, irrelevant initial keywords are extracted. • Semantic relation of words. Typically, users would test the application by typing a few words to start with, without making any sentence structure or phrases. As Zemanta tries to extract semantic relations from phrases, a text where the words do not make at least a phrase poses a problem for keyword extraction. The influence of this style of writing on keyword extraction should be further evaluated. • Unnecessary text markup. The content submitted to the keyword extraction service contained XML tags, which were internally defining the layout. These were not removed automatically, and thus influenced the keyword extraction. • Ambiguity. For small-size texts, keyword generation was sometimes biased by particular meanings of phrases, as the phrase context could not be determined. Implementation modifications. Several modifications related to keyword extraction were implemented after the first evaluation: • Including the content from previous slides. To address the cold start issue which occurs when a new slide is started, the content from two previous slides has been included in the keyword extraction, to provide a larger context. As even the completed slides can have a small number of words, this can be very useful. However, a problem can occur if there is a major topic change in adjacent slides. 57 • Title emphasis. To help solving semantic problems, the slide title was marked as emphasized, which is an additional Zemanta option to focus the extraction on particular words. Depending on the writing style of the author, this can improve the keyword extraction, but it can also degrade it (e.g. slide title “History”, as the history of a technology, could bias the generator towards general human history). • Text cleaning. The text submitted to the keyword extraction service was additionally cleansed of XML tags, as it was not done by Zemanta automatically. 5.3 Evaluation 2 The goal of the second evaluation was to analyze the influence of different text scenarios in presentation authoring: including an example, changing the sub-topic of the presentation and writing about a more general topic. Four users were involved in the evaluation. The process was the same as in the first evaluation: authoring the introductory slides on a topic in the computer science field. To analyze the text scenarios, one user was asked to include a real-world example, while a second user was asked to focus on a specific subtopic in some slides. The third user was writing about a more general topic ("open source"). The fourth user was writing a presentation without a specific scenario. It was expected that the different text scenarios and one more general topic would lower the similarity between the user and Zemanta keyword ranking. Fig. 7 and Fig. 8 present the evaluation results in the same way as the diagrams in the first initial evaluation. Fig. 7 shows the relation of the user ranking and Zemanta ranking. Fig. 8 shows the averages of user rankings for keywords in the same Zemanta rank. The highest-ranked keyword is ranked on average with 1.75, and the user relevancy ranking average drops as Zemanta ranking lowers, to an average of 5.5, for the fifth keyword. 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 0 1 1 2 2 3 3 4 4 User1 5 User2 5 User3 6 User4 Fig. 7. The relation between user and Fig. 8. The average user ranking. The X-axis internal ranking. The X-axis presents presents Zemanta internal ranks. The Y-axis Zemanta internal ranks, from 1-8 (1 being presents the average of user rankings for the the highest-ranked). The Y-axis presents user keywords in a particular Zemanta rank. For ranks from 1-5 (1 being the highest-ranked). this calculation, the keywords not being The actual ranking is marked with a dot of a among the 5 most relevant were given the different type for each user. rank 6. Some keywords most relevant to users occur in the lower Zemanta ranks (6-8): 58 • an example from banking for database systems was included, which caused the keywords related to the example (e.g. “bank”) to be extracted (User 2); • in the presentation about a less specific topic (“open source”), a keyword which was relevant to the user was in the lower Zemanta ranking (User 3); • in the presentation about HTML, the user was creating a slide specifically for dynamic HTML. As the previous slides were about HTML in general, the keywords were more related to HTML. The most important keyword – “dynamic HTML” – was ranked seventh by Zemanta (User 4). One way to solve these problems is providing a larger context, from the content itself (additional slides) or from the external environment. Another solution is to give users the option not to include the context of previous slides (useful for changing topics) and not to emphasize the slide titles (useful for misleading titles), but this could reduce the application usability as the user needs to manually select these options. Detecting the change of topics can be done based on the slide layout changes, as some authors divide the presentations in subtopics with slides of a particular layout, or by heuristics based on the topic changes per each slide or per slide sets. 5.4 Lessons Learned The majority of best-ranked keywords in these two evaluations were in the first 5 of the keywords suggested by Zemanta. Due to the specifics of the scenarios, some keywords which users chose as most relevant were in the lower Zemanta ranks. The users were creating presentation texts for evaluation purposes, not for real presentations. Therefore, some presentations contained very few words, which were not semantically connected. Although some authors prefer to create presentations without many words, the majority of authors still write at least a set of phrases on the slides, which is necessary for obtaining the relevant terms from keyword extraction services. 6 Conclusions and Future Work The evaluations performed confirm Zemanta as a sound basis for the intended purpose, based on the results and available features such as proposing the keywords - mostly abstractions - which are not present in the text, emphasizing the words to influence the extraction and internal ranking. The five highest-ranked keywords extracted by Zemanta will be used, as the users graded these keywords on average with more than grade 3 (the average of grades 1-5). Future improvements of keyword extraction include the use of keyword classification schemes to detect similar terms and exploring folksonomies as an additional way to find tags that are often used together. To address the problems observed in various text scenarios, two options will be implemented if the user wants to adapt the keyword list: removing a keyword from the list and simple user rating. If rating is used, Zemanta ranking will be combined with the user rating to form a more relevant keywords list. 59 Several questions remain: Will extracted keywords be found in metadata entries? Do more relevant keywords in the queries produce more relevant recommendations? What can be done not to omit the relevant content, while using this approach? These questions are certainly important and should be investigated. Besides the keywords, other research segments not discussed in this paper, such as including context information from the environment, will influence the quality of final recommendations. Therefore, further research will focus on usability of content reuse workflows, extraction of context from the authoring environments or learning management systems and mapping such context to learning object metadata. The proposed solutions will be evaluated using the developed prototype application. Acknowledgments. This work is supported in part by the Croatian Ministry of Science, Education and Sport, under the research project “Software Engineering in Ubiquitous Computing”. Katrien Verbert is a Postdoctoral Fellow of the Research Foundation - Flanders (FWO). References 1. Wirski, R., Brownfield, G., Oliver, R.: Exploring SCORM and the national flexible learning toolboxes. Proceedings of the 21st ASCILITE Conference, Perth. (2004). 2. Term Extraction Web Service - YDN, http://developer.yahoo.com/search/content/V1/termExtraction.html. 3. term extraction | fivefilters.org, http://fivefilters.org/term-extraction/. 4. Blog Smarter | Zemanta Ltd., http://www.zemanta.com. 5. Home | OpenCalais, http://www.opencalais.com/. 6. Developer Portal - News - Evri, http://www.evri.com/developer. 7. AlchemyAPI - Transforming Text Into Knowledge, http://www.alchemyapi.com/. 8. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - A crystallization point for the Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web. 7, 154-165 (2009). 9. Fellbaum, C., others: WordNet: An electronic lexical database. MIT press Cambridge, MA (1998). 10.ODP - Open Directory Project, http://www.dmoz.org/. 11.Entity Extraction & Content API Evaluation « ViewChange Development Blog, http://blog.viewchange.org/2010/05/entity-extraction-content-api-evaluation/. 12.Puzzlepieces – Comparing NLP APIs for Entity Extraction, http://faganm.com/blog/2010/01/02/1009/. 13. Dotsika, F.: Semantic APIs: Scaling up towards the Semantic Web. International Journal of Information Management. 30, 335-342 (2010). 14.Meire, M., Ochoa, X., Duval, E.: Samgi: Automatic metadata generation v2. 0. Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications. p. 1195–1204 (2007). 15.GLOBE | Connecting the World and Unlocking the Deep Web, http://globe-info.org/. 16.Nielsen, J.: Usability engineering at a discount. Proceedings of the third international conference on human-computer interaction on Designing and using human-computer interfaces and knowledge based systems (2nd ed.). (1989). 17.Bosnić, I., Pošćić, A., Ačkar, I., Žibrat, Z., Žagar, M.: Online Collaborative Presentations. Proceedings of the 32nd International Conference on Information Technology Interfaces - ITI 2010. pp. 1-6 , Cavtat/Dubrovnik, Croatia (2010). 60 Conversion of the YDP Learning Content to Common Cartridge Package Jaroslaw Dziedzic Young Digital Planet S.A. Educational ePublishing, Content and Technology, Ul. Slowackiego 175, 80-298 Gdansk, Poland jaroslaw.dziedzic@ydp.com.pl http://www.ydp.eu Abstract. This paper describes experiences earned during the devel- opment of a software application that could convert some YDP specific learning content into Common Cartridge. 1 Introduction Young Digital Planet has almost twenty years of experience in digital learning content creation. We have created several generations of applications and con- tent. The experience and ideas have accumulated over years, resulting in broad range of top-quality educational content, advanced applications and tools, as well as effective and flexible data formats for storing and processing the digital content. We observe emerging standards in the field of digital learning content data formats with interest, and we keep evaluating them and assessing their usefulness for the packaging and distribution of our content. The ASPECT project gives us the opportunity to take a look at the new Common Cartridge emerging standard, analyse it thoroughly and decide, if we can implement it in our products, and how to do it. It would not be possible to reach a deep understanding of any new concept just by reading the documentation. Therefore, in order to gain some knowledge and experience, we decided to produce a tool that will try to convert our existing content to the CC format. Our goal was to have our content with all, or most of its features, packed into the Common Cartridge and running in the CC player. 2 YDP Content Concept vs. Common Cartridge Concept Right from the start of reading the CC specification it became obvious, that the CC concept differs in many ways from our existing content. These differences could break the project just at the beginning, but fortunately we found some methods of changing our content to suit the Common Cartridge way. Let’s look first at the key concepts for YDP content and CC. 61 2.1 YDP Content Our content is usually organised in lessons that consist of several pages. Each page contains some educational content (text, pictures, slideshows, video, sound clips, interactive animations etc.) and exercises related to this content. The stu- dent learns not only by reading/viewing the static or interactive content, but also by solving exercises and observing the feedback from the computer. The process is interactive, exercises are tightly integrated with the learning content and it is impossible to separate them from the material presented on the page. The scoring of all the exercises in the lesson is usually presented and summed up on the last page of the lesson, and the student has the possibility to return back to any exercise, read the lesson again, sort out his answers and see the result. We use several different kinds of interactive exercises in addition to typi- cal multiple choice and fill-in-blanks. Pages can have sub-pages (using tab-sheet controls), buttons that launch some actions or just open popup pages with ad- ditional content etc. Everything is laid out by our graphic designers to form harmonious and attractive environment for learning content and exercises. Our entire content is presented to the student by means of a dedicated player written in Adobe Flash. We have no concept of static content displayed outside of the player − for instance as a PDF or HTML page. Everything is played by our flash application. The player must always accompany the XML and data files. 2.2 Common Cartridge As we understand from the specification, the Common Cartridge is based on the concept of the complete separation of a learning content and exercises. The only reason the exercises are included − is to test the student’s knowledge, acquired while reading/viewing the static resources. Exercises cannot be mixed with the learning content and do not participate in student’s interactive learning experience. They simply serve as an assessment tool or they form a pool of exercise resources (Object Banks) aimed for a teacher, who wants to use them in his custom-made lessons and examinations. While there is really great freedom about the way the static content can be built (HTML, PDF, flash objects etc.), this is not the case with interactive exercises. They can be described only by means of the old QTI 1.2.1 data format (constrained even further by the CC profile), and played back by the player integrated with the server software. There is no place for the dedicated player here, so the content producer has really limited possibilities with regard to choice of exercises and the way they are presented to the student. In fact − even though the CC specification claims that it can handle 6 exercise types − there are just two distinct exercise types to choose: multiple choice (in 3 simple variants) and fill-in-blanks. And even these two types have important limitations that allow for their use in really simple cases only. On the other side − the lack of a dedicated player can be an advantage for some content producers that are interested in simplicity. Creating only the 62 authoring software for QTI 1.2.1 could be − in theory − simpler than creating both authoring software and a dedicated player. It would be true − if the QTI 1.2.1 was straightforward and easy to implement. Unfortunately − it is not the case, and I will explain it in details later in this document. One advantage of a CC approach without the dedicated player can be easily seen. The CC package doesn’t have to care about communication with the LMS server, there is no need to think about the exercise status, scoring etc. − because all these issues are handled by the standard player integrated with the server. The content producer simply has to describe exercises using some standard data format. It is a really great idea. But the choice of QTI 1.2.1 as the data format for interactive exercise representation, and restricting this format in the CC profile, makes it difficult to create anything except some really simple content. 2.3 Key Differences The main difference between the YDP and CC content is the existence of the dedicated player for the YDP content. It doesn’t exist in Common Cartridge and all interactive content for CC must be described by means of QTI 1.2.1 files. Another important difference is the complete separation of learning and ex- amination in CC − while in YDP content most exercises are alternated with static content on lesson pages, and they together form an interactive learning experience. Obviously, we also make use of the ”examination” concept (just like in CC) and the series of exercises that form an assignment also exists in our content − but they are not dominant. We believe, that solving exercises is a part of a learning process and exercises should not be limited to examine the students only. The really limited number of exercise types available for CC is another prob- lem for implementing a conversion of our content. It is not possible to use connec- tion exercises, word jigsaw, sentence jigsaw, element grouping, text item identi- fication, crosswords, marking elements, colouring pictures, solving mathematical equations and many more It makes the straight conversion impossible. And last, but not least − the visual appearance. We put much effort in graphic design of our lessons. Pages are beautifully laid out by graphic designers to satisfy both the aesthetics and the ergonomics. As this would not be a problem to obtain the same level of aesthetics on the Common Cartridge static pages − it is impossible to achieve it with assignments. Using the old QTI 1.2.1 there is no way to express the layout and graphical appearance of the page. We must reconcile ourselves to simple text exercises laid out one per page. We are aware of the fact that some aspects of visual appearance can be defined using ”text/html” instead of plain text. But this potentially powerful feature is not well explained in the specification − so we don’t know, what is the intended scope of the HTML that can be used in QTI elements. It would be rather unrealistic to believe, that one can put a complicated HTML there − with pictures, tables or JavaScript for example. So we believe, that ”text/html” means just some text attributes − like bold, italic or font colour and size. While it is useful to have 63 such possibilities, they would not allow us to design the layout and create a visually stunning page. 2.4 The Solution As described before, it is impossible to make a straight conversion of our content into CC, because of the limited number of exercise types available, and because of the strict separation of lessons and exercises in CC. However, we’ve invented a workaround for this problem. The CC specification allows us to put any ”web content” into the package, so we can also include our player − which is, in fact, just a big SWF file that can be run in a flash plug-in in a HTML browser. The player will not be able to communicate with the LMS server (because it is not the CC way of doing things), but it will be able to play our content without any problems, with all kinds of exercises and multimedia files, with attractive appearance and with the pages layout that we designed. But from the CC server’s point of view − it will be just the static content and the exercise results and state will not be reported to the server. From the student’s perspective, however, lessons with educational content alternated with exercises still form a very valuable learning experience: interactive exercises can check student’s answers, show hints, display feedbacks and show results. The results and exercise state are transient and will not survive when the student closes his session with the system, but they assist the student during the course of learning. And then it comes to the examination. We have to stick with the CC con- vention and try to represent some exercises in QTI 1.2.1. We decided that for each lesson we will prepare some additional pages with simple exercises that can be converted to QTI, and they will form an examination after the lesson. So each of our packages will consist of an interactive lesson with exercises that do not report results to the LMS, and the assignment with some exercises related to that lesson − prepared in QTI 1.2.1. 3 Implementation of the YDP → CC Converter We have designed and implemented the application that takes a number of lessons from our existing content and includes them in the Common Cartridge package. The application is written in Java, and it has the user interface that makes it possible to select lessons for conversion and specify the conversion type. The lesson data can be interpreted in two different ways, depending on the type selection made. It can be either ”lesson” or ”test”. 3.1 Lesson Conversion If a ”lesson” conversion type is selected, all the data files (various XML and media files) are copied into the package directory structure and appropriate entries for each file are added in imsmanifest.xml file. The directory containing 64 all the files looks almost identical, like in our original content; the location of files in relation to each other is preserved. The only change to the original content was the necessity to change filenames of most xml files − I will explain it in details later. For each directory that contains the data for a single lesson, a HTML file is generated during the conversion. This HTML contains a Flash Player applet, that loads our dedicated player (in SWF file), and this player loads end executes the lesson content described in XML files. The dedicated player is also copied to the package into a separate directory, and is referenced from each generated HTML file. All the files described above are registered in imsmanifest.xml file as a ”webcontent” data type. From the server’s point of view, they just form a HTML page with the embedded SWF file. Everything should run on the CC server as a static content − i.e. without the possibility to report the exercise results to the server, and without the persistent storage of exercise state. However, the content in the cartridge should look and behave identically to our existing applications. We validated generated packages using the Cartridge Validator application (Common Cartridge Test Tool). This application proved really useful and helped to find some bugs in the package structure. Finally, we had the validated package that truly conformed to the CC specification. The final step for static content conversion was to run it in some reference Common Cartridge player and verify that it works just like expected. The prob- lem was the apparent lack of the reference CC player on the IMS Global site. We couldn’t find this important resource, so the only option was to use the player from Icodeon for testing purposes. Thankfully, we had no difficulties getting the access to Icodeon site to try their player. We had only one problem during this stage of implementation. The package generated just like described above didn’t seem to work It seemed strange, be- cause we expected that the package that validates correctly will, at least, show some signs of working But we had just a grey rectangle. After some debugging we found that most of our XML files looks like they did not exist on the server (we got error 404 for them), while some others XML files and all media files (JPG, SWF, MP3 etc) were okay. All the missing files had something in com- mon: they were XML files, but their filename extension was different: .page or .flo for example. The possible explanation of the problem could be the MIME- type settings on the server (ignoring all unknown file types), or the package import algorithm, that rejects unknown file types. Anyway − we had to rename all the files to XML. It was not that easy, because there was the need to find all internal references to renamed files in XML and change them. Of course, we finally succeeded and we’ve got our content perfectly running on the Common Cartridge server!!! 65 3.2 Exercise Conversion If a ”test” conversion type is selected, the YDP content is treated in a different way. It has to be converted to QTI file. Since this conversion is impossible for our stock content, we use the content created especially for this purpose. It contains only two exercise types (single/multiple choice and fill-in-blanks), and text modules for presentation of questions. We tried to get the most out of the QTI specification and used advanced features, like feedbacks. However, the results were mixed, some features that we wanted to implement were acceptable in the CC specification, some were not. The specification is not clear in many places, and it was an additional difficulty. We were also surprised with the fact, that even these simple exercise types allowed in the ”CC profile” do not always work as expected: we couldn’t convert our fill-in-blanks exercises to CC, because they usually (in fact − almost always) contain many blanks in a text. And it looks that ”CC profile” restriction to the QTI doesn’t allow us to have more than one gap in a single exercise, and this single gap must be at the beginning or at the end of a sentence... But the main problem was the exercise scoring method in QTI 1.2.1, which is overly complicated and difficult to implement. Wouldn’t it be better to select some simple and standard scoring methods for the exercise, instead of this big and complicated section for each item? Anyway − the problem with is much more serious: despite its complexity, its abilities are really small. This is because of the CC Profile restrictions that force to use the SCORE variable only in ”all or nothing” manner. It means that if we have the multiple choice exercise with ten checkboxes, five of them being correct answers − than we have no way to express that we want 20 points for each correct answer. All we can do is to set 100 points if all five correct checkboxes were selected. In case of any error, we have to set 0 points for the entire exercise. This single deficiency makes the whole CC system useless for the representation of any high quality content. Really, only very simple exercises, like choice with a single correct answer (with radio buttons instead of checkboxes), or fill-in-blanks with a single gap, can be correctly scored. The YDP converter creates QTI files that can be positively verified by the Common Cartridge Validator. However, when we run them in the CC player, which is available for us, only fragments of exercises appear and they do not act as expected. Perhaps it is because the Icodeon player is also under development and maybe we just run into some not implemented features, or just bugs. But at this stage of development we really need a rock-solid reference player to verify our implementation in practice. 3.3 Problems and Suggested Solutions The most difficult part of this project was to study the specification. The Com- mon Cartridge Specification document [1] is not precise, not coherent and even sometimes internally contradictory. Some simple, obvious topics are explained 66 with such a great amount of detail that makes them almost unreadable (chap- ters 3.3 and 3.4 about file location in a package and references between them, for instance) and difficult to understand. On the other hand, some topics that should be explained with more details are too brief. This is the case with the QTI description (chapter 4.9). The diagrams for QTI are helpful, but they are con- tradictory to the description sometimes, especially when it comes to differences between QTI 1.2.1 and QTI Common Cartridge profile. We couldn’t find definite answers for many questions − for instance whether one can use pictures or video in exercises. The diagram in paragraph 4.9.8 of the specification [1] suggests that it is possible (see MaterialSelection box), the description in paragraph 4.9.1.4 and 4.9.2 says − that only text is allowed. Such inconsistencies are frequent in this document. Writing a new version of the ”CC profile” specification, similar in quality to the QTI 1.2 specification document [2] (which is, in contrast, very clear, precise and easy to read) would be very much appreciated by developers. The Validator tool proved to be really very useful. However − it didn’t find a problem with XML files that have the filename extension different than .xml. We suggest updating this tool, including also filename extension validation. There is an urgent need for the official reference Common Cartridge player, which would be available to any developer that is interested in CC package creation. We just can’t convert anything to CC if we can’t see the proof that everything is working correctly. The specification, that doesn’t give clear answers for many questions, makes the problem even bigger. Many issues just have to be tested in a reliable player. The reference CC player is on the top of my wish list today. 4 Conclusions Despite the obvious limitations of the Common Cartridge, we want to continue our efforts to familiarise ourselves with this standard and prepare some software tools, that would allow the conversion. However, Common Cartridge has too many limitations at the moment to be considered our main format for packaging the content. But we acknowledge, that it has not been designed for such a task. It was designed to make possible the creation of small exchangeable items, learning objects that can form building blocks for a teacher, who wants to create her own course. We would be happy to be able to create such small building blocks in the future, based on our existing content. But for this application to be successful, the CC standard should evolve and allow for more interesting, more complicated content. We would appreciate some improvements and changes − the replacement of the ”QTI 1.2.1 CC Profile” with something better would be our first postulate. One possible solution could be a move to QTI 2.0 that solves many problems of the old version (it has quite broad choice of exercise types, more logical and concise syntax, support for HTML-like layout control and CSS styles). Or, as a minimum, the new version of the CC profile still based on QTI 1.2.1, but with less restrictions (especially in sections and in the 67 area). We are open to discuss these issues and exchange ideas to make this standard better and really useful for the software industry. References [1] IMS Common Cartridge Profile, Version 1.0 Final Specification (http://www.imsglobal.org/cc/ccv1p0/imscc_profilev1p0.html) [2] IMS Question & Test Interoperability, Final Specification Version 1.2 (http://www.imsglobal.org/question/index.html) 68 Authoring for re-use in outcome-oriented learning scenarios Roland Klemke1, Birgit Schmitz2 1 Centre for Learning Sciences and Technologies Open University of The Netherlands Valkenburgerweg 177, 6419 AT Heerlen, Netherlands roland.klemke@ou.nl 2 Humance AG Goebenstraße 10-12, 50672 Köln, Germany bsc@humance.de Abstract. Content production processes currently experience a shift in focus. Due to the growing trend of highly individualized learning scenarios they have to face and to combine a multitude of different standards. The European eContent+ network of Excellence ICOPER researches possible strategies and implementations to deal with this new situation. In its course the project develops a comprehensive set of prototypes that use, evaluate and propose extensions to a large number of currently relevant standards such as IEEE RCD, SCORM or OAI-PMH. This paper describes a collaborative, re-use based authoring approach that was realized with one of these prototypes. Keywords: authoring, re-use, interoperability, outcome-oriented learning 1 Introduction Today’s technology enhanced learning scenarios focus on learning outcome oriented delivery of learning processes, contents, and services. Rather than pre-defining static curricula individual learning processes are enabled: learner profiles indicate individual gaps, learning outcomes describe the required skills, recommended learning materials help to close these gaps, and assessments indicate successful mastery which is reflected back into learner’s profiles. This way, the learner gains a large factor of ownership of learning [9]. In such complex learning situations interoperability issues become important: standards to describe learning outcomes, learner profiles, assessment items and learning materials are needed. Technical interoperability between different components of an overall learning delivery toolset is required in order to ensure seamless learning processes. Web-service based approaches help to simplify technical interoperability [12]. The learning outcome orientation also changes the way learning contents are produced and organized. The traditional “one-size-fits-all” approaches deliver the 69 same content to a large target audience. The production of learning content at high cost is therefore justified. Outcome-oriented content production processes however, deliver highly individualized content and have to cope with a large diversity of interoperability standards due to the multitude of sources they are based on (different learning objects, learning outcomes, learning designs and learner profiles). The need for re-use is therefore essential. The European ICOPER project [8] analyses and discusses state-of-the art implementations of current standards as a base for the development of a comprehensive set of prototypes that support individual learning, teaching and authoring. In the course of ICOPER, the Open ICOPER Content Space (OICS) was developed, which combines learning object metadata repositories, learning outcome repositories, learning design repositories and learner profile repositories. The OICS offers a service interface, which allows to retrieve from and publish into the OICS [10]. Some of the standards the OICS works with, comprise: • IEEE Reusable Competency Definitions (RCD) defines a data model for describing, referencing and sharing competency definitions. The ICOPER Learning Outcome Definition (LOD) is an application profile based on RCD that can be used to create Personal Achieved Learning Outcome (PALO) profiles [7]. • Sharable Content Object Reference Model (SCORM), a standard to describe structure and behavior of content and components [1]. • Learning Object Metadata (LOM), an IEEE standard to describe metadata for learning objects in a standardized way [5]. • Open Archive Initiative’s Protocol for Metadata Harvesting (OAI-PMH), a protocol specifying the harvesting of metadata for learning objects residing in repositories [6]. In the course of the ICOPER project, a collaborative, re-use based authoring prototype was developed, that is based on the OICS. Figure 1. Architecture of the ICOPER authoring for re-use prototype 70 The prototype focuses on three main processes: 1. The collaborative collection and organization of media assets. Media assets comprise individual content elements such as texts, pictures, videos, and audios, which form the basis of all content productions. 2. The collaborative creation of learning units based. Learning units are navigable and interactive learning contents built out of individual media assets. 3. Preparation for re-use. The previous two processes are supported by a background harvesting process which updates the metadata repository of the OICS in order to make updated contents searchable and retrievable. 2 Collaborative collection of multi-media assets A common problem to all collaborative, re-use based production processes is the retrieval, organization and management of media assets. Especially, when production processes have to cope with heterogeneous target groups (e.g. different languages, support for disabled people) or dynamic topic domains (with many subsequent content versions) media asset management is a complex matter. Within the ICOPER project we therefore developed the MediaLibrary prototype. This online tool offers the opportunity to share media within a community. As opposed to other publicly available platforms (such as Flickr or YouTube), the MediaLibrary is designed to support media production processes, content-re-use, complex media models (with different versions, variants, languages, media formats) and corresponding metadata. The MediaLibrary is connected to the OICS via an OAI- PMH interface through which media assets within the MediaLibrary can be searched, browsed, and re-used. This way, media asset collections become part of larger learning content and metadata repositories. 3 Collaborative creation of learning units The creation of learning units involves different tasks comprising the development of didactical concepts, storyboards and the actual content production which can be supported by content authoring environments. In our authoring prototype we use a customized version of author42 (named author42.ICOPER) that is a web-based authoring environment with extensible interfaces [2] which enables teams of authors to collaboratively create learning contents. author42.ICOPER is integrated with the OICS: as a result the whole repository (including contents from the MediaLibrary and other sources) can be searched directly from the content production environment and retrieved results can be seamlessly integrated in the current production process. 71 To enable this integration, author42.ICOPER was extended with a customized search interface that can be launched from within any content production step. The search interface allows searching the OICS using combinations of different metadata and keyword fields. Search results can be selected and integrated into the current content page just as the users own content would be: page layout, element sizing and positioning functionalities can be used to fit the search results into the content page. Through a web-service-based publishing interface, the content created can be published from author42.ICOPER into different repositories. In our prototype, we chose the open source learning management system OLAT [4] which we extended with an OAI-PMH target to connect it to the OICS again. This way, the content is available for further re-use. 4 Preparation for Re-use Both prototypes, the MediaLibrary and the author42.ICOPER, offer OAI-PMH targets to access their metadata and contents from external repositories. The OICS contains an OAI-PMH compliant harvesting module [11] that is capable of accessing these targets and retrieving the metadata accordingly. That way, the OICS maintains a searchable repository of metadata that refers to the original contents. The search function of the OICS can be used in two different ways: (a) directly, through the OICS’ own search interface, or (b) integrated into other applications using the OICS’ web-service interface. The latter being the case in the author42.ICOPER integration of the OICS, which enables search results to be automatically and seamlessly embedded into the content production process. 5 Conclusion and Outlook We have presented a prototype to support collaborative, re-use based authoring for modern, outcome-oriented learning approaches. In a first evaluation of the abovementioned prototypes and their interoperability with the Open ICOPER Content Space (OICS), we performed an evaluation workshop at the JTEL SummerSchool 2010 in Ohrid, Macedonia. Ten participants with different backgrounds covering teachers, researchers, and students from computer science, technology enhanced learning and other fields took part in the evaluation. In a first evaluation step, participants were asked to organize and metatag MediaAssets according to a selected LOD in the MediaLibrary prototype. These assets were re-used in the online authoring system author42.ICOPER to produce a unit of learning. Finally, participants published the results to enable further re-use via the OICS. Despite some technical problems that arose due to the prototypical status of the tools in use, participants could effectively solve their tasks and rated the toolset to be highly relevant. In the meantime and with the prototype still under way, a new approach to connect repositories and authoring tools undergoes standardization efforts: the Simple Publishing Interface (SPI). This draft standard [3] especially focuses on the 72 integration of publishing tools (like the authoring tools used in our prototype) and repositories (like the OICS). We are planning to use SPI for the next version of our prototype to support direct collaboration and immediate re-use of results. Acknowledgments. Parts of this work are funded by the eContentPlus Programme of the European Commission through the ICOPER project . References 1. ADL SCORM (2004). Advanced Distributed Learning Sharable Content Object Reference Model, 2nd ed., Overview, 22 July, Alexandria, VA. 2. Cristea, A. (2006). Interfacing adaptive solutions with corporate training systems. PROLEARN Deliverable 1.6, retrieved online 19.06.2010 at: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.99.3240&rep=rep1&type=pdf. 3. CWA 16097 (2010). The Simple Publishing Interface (SPI) Specification. February 2010. Available at: ftp://ftp.cen.eu/CEN/Sectors/TCandWorkshops/Workshops/CWA16097.pdf. 4. Fisler, J. & Schneider, F. (2008). Creating, Handling And Implementing E-Learning Courses and Content Using the Open Source Tools OLAT and eLML at the University of Zurich. ISPRS Conference 2008 -TS ThS-16: New Approaches and Tools for Education and Capacity Building International Society for Photogrammetry and Remote Sensing (ISPRS), Beijing, China, July 3-11 2008. 5. IEEE LTSC (2002). Final Draft Standard for Learning Object Metadata. Retrieved 01/15/10 at http://ltsc.ieee.org/wg12/files/LOM_1484_12_1_v1_Final_Draft.pdf. 6. Lagoze, C. and Van de Sompel, H. (2001) The Open Archives Initiative: Building a Low- Barrier Interoperability Framework. Proc. 1st ACM/IEEE-CS Joint Conf. Digi- tal Libraries, ACM Press, pp. 54–62. 7. Najjar, J., Derntl, M., Klobucar, T., Simon, B., Totschnig, M., Grant, S., & Pawlowski, J. (to appear 2010). A data model for describing and exchanging Personal Achieved Learning Outcomes (PALO). International Journal of IT Standards and Standardization Research. 8. Najjar, J., Simon, B. (2009). Learning Outcome Based Higher Education: iCoper Use Cases, IEEE International Conference on Advanced Learning Technologies (ICALT), workshop on European Workshop On E-Learning Standards Best Practice, Riga - Latvia, July 14 - 18, 2009. 9. Savery, J. R., & Duffy, T. M. (1995): Problem based learning: An instructional model and its constructivist framework. EDUCATIONAL TECHNOLOGY-SADDLE BROOK NJ-, 35, 31–31. 10. Totschnig, M., Derntl, M., Gutiérrez, I., Najjar, J., Klemke, R., Klerkx, J., Müller, F. (2010). Repository services for outcome oriented learning. 4th international workshop on Search and Exchange of e-le@rning Materials (SE@M), Barcelona, 27-28 September 2010. 11. Totschnig, M., Klerkx, J. Klobučar, T., Law, E., Simon, B., Ternier, S. (2009). D1.1 Open ICOPER Content Space Implementation of 1st Generation of Open ICOPER Content Space including Integration Mini Case Studies. ICOPER project deliverable. Retrieved Online at 28.06.2010 from http://www.educanext.org/dotlrn/clubs/icoper/new- lors/Deliverables/Deliverables_-_Submitted/D1.1/D1.1-final.pdf 12. Vossen, G. and Westerkamp, P. (2008). ‘Why service-orientation could make e-learning standards obsolete’, Int. J. Technology Enhanced Learning, Vol. 1, Nos. 1/2, pp. 85–97. 73 Exploring Quality Issues in the Use of LOs: To Tag or Not to Tag? Rune Hjelsvold1 , Jingjing Fan1 , Yngve Nordkvelle2 , and Kjell Are Refsvik1 1 Gjøvik University College, Gjøvik, Norway 2 Lillehammer University College, Lillehammer, Norway Abstract. In this paper we describe a study where students tagged learning objects created by their professors. The study shows that the student tags extend the professors’ view of the contents of the learn- ing objects and add a wider context for interpreting the content of the learning objects. During interviews conducted at the end of the study, the professors reported that the students’ tags represented a form of feed- back that would help them recognize discrepancies between the learning objects’ intended purpose and the perceived purpose. 1 Introduction According to Kay and Knaack [4], the majority of researchers have emphasized technological issues such as "accessibility, adaptability, the effective use of meta- data, reusability and standardization" when defining the term Learning Object (abbreviated LO in this paper). Kay and Knaack, however, define learning ob- jects as tools that should enhance, amplify, and guide the cognitive processes of the learners [4]. The aim of our study is partly to investigate whether student- generated tags may be considered useful as content-descriptive metadata and partly whether the student tagging process itself will have a positive effect on the pedagogical value of the use of learning objects. 1.1 Tags as Metadata Researchers currently have different and disagreeing views on the quality of user- provided tags and folksonomies (i.e., a vocabulary that has emerged organically as a result of the end-user tagging activities). Guy and Tonkin [2], for instance, describe tag sets as uncontrolled and chaotic and not very well suited for support- ing searching. In their study, Sen et al [6] found that only 21% of user-provided tags were considered worthy of general display by other users. Al-Khalifa and Davis [1], on the other hand, found that folksonomy tags were better in terms of search and contextual coverage than the metadata created by the human expert. Vuorikari et al. [7] found that tags can enrich and add value to controlled vocabularies. In our study, we will investigate this further. 74 1.2 Learning Objects in a Pedagogical Context The idea that what the teachers intend to communicate is received by students is much criticized. In radical constructivism, it is claimed that it is less than likely that the received message is congruent with the sent (Qvortrup [5]). The didactical operations performed by the teachers are produced in order to increase the chances for the student to achieve a higher degree of understanding. On the other hand, students’ interpretations and misinterpretations are both regarded as fruitful processes. Hansen and Brostrøm [3] argue that professors need to develop their skills as teachers in tagging their LOs. The discrepancies between the outcomes of the tagging performed by the professor and their students are indicative of the degree of overlap between teachers’ intentions and students’ reception of the same message. 2 The Student Tagging Study In this project we studied two groups of master students at Gjøvik University College. Each group consisted of approximately ten master students. Group 1 was a group of first year students on the Master of Media Technology program participating in a course on media data coding and compression. Group 2 was a group of second year students on the same program who had previously par- ticipated in a course on semantic web. The courses were taught by two different professors – one for each course. Both courses were offered as blended learning in which regular lectures were recorded. The recorded lectures along with lecture notes in PDF were used to produce LOs stored in an LMS. The professors assigned keywords to each learn- ing object as content-descriptive metadata. The project was divided in two main parts. Firstly, the students used a web- based application to tag LOs blindly (i.e., students did not get to see other students’ tags). Secondly, semi-structured interviews were conducted in which the two professors and some of the students were interviewed individually. The key characteristics of the two groups and the generated metadata can be summarized as follows: The individual student in group 1 generated 3.5 tags per LO on average, while the average in group two was 3.3. At the same time, the professor teaching group 1 assigned 14.6 per LO on average while the pro- fessor for group 2 assigned only 6.9 keywords per LO on average. On average, 1.3 tags generated by the individual student in group 1 also appeared as key- words assigned by the professor. The corresponding value for group 2 was 1.35. The overlap between the sets of tags generated by the students and the sets of keywords assigned by the professor per LO is further illustrated in Fig. 1. The upper part of the bars shown in brighter colors exposes the number of keywords assigned by the professors that did not appear in the sets of tags generated by the groups of students. The lower part of the bars shown in darker colors ex- poses the number of the number of tags generated by the groups of students that did not appear in the sets of keywords assigned by the professors. The mid- 75 dle part shown in the darkest color illustrates the amount of overlap between student-generated tags and professor-assigned keywords. Fig. 1. Number of common terms (darkest color), compared to number of professor keywords only (lightest color) and to total number of student tags (bottom part) for the two courses Finally, Figure 2 illustrates the level of agreement among students with re- gards to the tags. The ten most popular tags are enumerated along the horizontal axis (tag 1 being the most popular for the given LO). As can be seen on the fig- ure, student tags mostly differ from the keywords chosen by the professors. Our study therefore indicates that the students’ interpretation of content of the LO is different from the professors’. It thus seems like student-generated tags would be useful as a complementary type of metadata to professor-assigned keywords. 3 Value of Student Tagging: Beyond Content Descriptive Metadata The two professors involved in the study and six of the students from group 1 were interviewed at the end of the study. The purpose of the interviews was to investigate whether student tagging added value beyond producing content descriptive metadata. The students all agreed that the keywords provided by the professors were useful for them in interpreting the content of the LO. They even requested that student tagging of learning object should be introduced in all the courses they were signed up for. They found the keywords especially helpful in finding what the key aspects of the LO were. 76 Fig. 2. Level of agreement among students for the most popular tags A majority of the students also considered it useful to be able to view fellow students’ tags. They also thought it would be useful if they had the opportunity to view students’ tags from earlier years of the course – even though no such tags existed in the demo system. Both professors agreed that student-defined tags were quite similar to their own keywords although they were not the same. Most importantly, however, the participating professors emphasized that the tags represented a sort of feedback that would help them understand how well the students were able to grasp the contents of the LO. The two participating professors both stated that the feedback from the stu- dents would have a real impact. In some cases, they found that the students had chosen tags that they would like to include as one of the professor-assigned keywords in the future. In other cases, the feedback indicated that the students had missed some important messages or misinterpreted the LO. In these cases, the professors said they would either modify the original lecture and accompa- nying LO, or they would repeat these issues again in future lectures to help the students achieve a correct and deeper understanding. One striking observation is that none of the professors considered removing even one single keyword from any of the LOs even if the students did not use it as one of their tags. The professors saw the student-generated tags as a possible extension to the set of keywords they assigned – not as a potential replacement. Although the study of student tagging of LOs and the follow-up interviews were rather small, they indicate that content-descriptive metadata in the form of professor-assigned keywords and student-generated tags add value beyond sup- porting searchability and reusability. The students consider professor-assigned keywords as a help in interpreting the contents of the LOs and the professors see student-generated tags as valuable feedback from the students both when it comes to the quality of the LO and the quality of the accompanying set of metadata. 77 4 Conclusion and Further Work Our study was rather small, involving some 20 students, two professors, and 16 LOs. Still, the study gives some interesting indications on the usefulness of student tagging and interesting paths for further work. Our study shows that there is some overlap between the tags that students create for LOs and the keywords the professors assign to the same LOs. There is, however, also a significant difference between the two. This difference may provide a wider context for interpreting the content and context of the LO. Our study also shows that student tags may be utilized in the quality management of the LOs. The professors could recognize discrepancies and patterns of differences between the student-generated tags and the original set of professor-generated keywords that might call for quick or dramatic alterations of LOs. As a means for quality improvement this dimension goes far beyond the quest for retrieval and reusability question to include issues of feedback, evaluation and might improve the level of collective engagement and learning retrieval. This project is part of an ongoing process in developing a Learning Object Repository at Gjøvik University College that will provide student tagging capa- bilities. References 1. Al-Khalifa, H.S., Davis, H.C.: Replacing the Monolithic LOM: A Folksonomic Ap- proach. In Proceedings of the Seventh IEEE International Conference on Advanced Learning Technologies (ICALT 2007), 665-669 (2007) 2. Guy, M., Tonkin, E.: Folksonomies: Tidying up tags? D-Lib Magazine, doi:10.1045/dlib.magazine (2006) 3. Hansen, P.S., Brostrøm, S.: Tagging the didactic functionality of learning objects. World Conference on Educational- Multimedia, Hypermedia & Telecommunica- tions Chesapeake, Va. : Association for the Advancement of Computing in Edu- cation (2002) 4. Kay, R., Knaack, L.: Evaluating the learning in learning objects. Open Learning, doi:10.1080/02680510601100135 (2007) 5. Qvortrup, L.: Society’s Educational System - An introduction to Niklas Luhmann’s pedagogical theory. Seminar.net, 1:1 (2005) 6. Sen, S., Harper, F.M., LaPitz, A., Riedl, J.: The Quest for Quality Tags. In Pro- ceedings of GROUP’07, 361-370 (2007) 7. Vuorikari, R., Sillaots, M., Panzavolta, S., Koper, R.: Are Tags from Mars and De- scriptors from Venus? A Study on the Ecology of Educational Resource Metadata. In Proceedings of the International Conference on Web-based Learning (ICWL 2009), 400-409 (2009) 78 Educational Resources Packaging Standards SCORM and IMS Common Cartridge – The Users Point of View 1 Kati Clements, 2Àgueda Gras-Velázquez, 1Jan M. Pawlowski1 1 Global Information Systems, University of Jyväskylä, 40014 Jyväskylä Finland, 2 European Schoolnet, Rue de Lalaing 24, 1040 Brussels, Belgium kati.clements@jyu.fi, agueda.gras@eun.org, jan.pawlowski@jyu.fi Abstract. This paper reports on an exploratory study analyzing the Educational Resources’ packaging standards SCORM and IMS Common Cartridge’s (CC) regarding interoperability from the point of view of key users: teachers. The two specifications SCORM and CC to package Educational Resources have been developed to help the users to re-use Learning Objects from Learning Object Repositories (LORs) in Learning Management Systems (LMS) of schools. In our study, we found that teachers find packaging solutions highly useful, especially the interoperability between LORs and LMSs. Teachers also appreciated that they can modify packaged content after it has been uploaded to the LMS. The teachers also strongly appreciated the additional functionalities of CC packages while teaching courses online or giving home work/extra assignments to their students. Keywords: Educational Resources, content packaging standards, Learning Object Repositories, SCORM, IMS Common Cartridge, teachers 1 Introduction This paper shows the views of teachers on interoperability between Learning Object Repositories (LOR) and Learning Management Systems (LMS) supported by two content packaging standards: SCORM and IMS Common Cartridge (CC). We set up a testing experiment and survey to find out 1) whether teachers would get any additional value from using these standards and 2) whether tools used to support these standards are reasonable for teachers to use in their everyday teaching. According to The IEEE Learning Technology Standards Committee, a Learning Object is defined as "any entity, digital or non-digital, that may be used for learning, education or training" [1]. Learning Objects in this paper are defined as digital objects which can be used, re-used or referenced during a technology-supported learning process [2]. Educational Resources can be defined as Learning Objects that can be employed in technology-supported learning [3]. Learning Object Repositories are collections of Learning Objects that are accessible to users via a network without prior knowledge of the structure of the collections [4]. Distribution of Educational Resources is most commonly done via LORs. This study investigates the views of teachers re-using and sharing Learning Objects from LORs with the help of interoperability standards SCORM and IMS Common Cartridge. In their everyday 79 lesson preparation, teachers use Educational Resources in LMS such as Moodle [5]. LMS help teachers to manage digital contents, to aggregate blended learning and give homework to their students. Recent research involving packaging standards SCORM and IMS Common Cartridge has been focusing on the technical specifications [6],[7],[8], without investigating the teachers’ point of view. The assumed benefits of SCORM and CC have not been verified by the end user group, leading to the questions of this study: Are the teachers willing to use SCORM and CC packages in their busy schedule of designing Educational Resources for their classes. Is the use of content packaging yet another technical problem which the teachers will not be able to handle with a reasonable amount of training? This study aims to examine what the real users, the teachers, think of using these Learning Object packaging standards and ask if they can see the interoperability between LOR and LMS facilitating the work they are doing in their everyday lesson plan creation process. When trying to evaluate the impact of a learning technology standard to the users, it is important to realize that specifications cannot be evaluated by users as users do not work with them directly. Rather, interoperability specifications are implemented in software tools offering a set of functionalities to the end user. End users can then make use of the tool in practice and that use can be evaluated. The evaluation results need to be analyzed in detail to assess whether problems are caused by the interoperability specifications or by the functionality provided by the tool, or the user interface through which the functionality is made available to the end user. [9] However, it can be evaluated whether the users have understood and used the key concepts of a specification (such as packaging or metadata categories). To avoid this problem, we look at the key concepts and functions enabled by the standards. As SCORM and CC are both standards which are widely implemented in tools, we can map the standards’ concepts and the resulting functionalities in the tools, which are then assessed in practical experiments. Therefore we tested the standards by using the tools keeping in mind that the interfaces of these players could affect the minds of the users. 2 Open Educational Resources Packaging Standards In the following, we discuss the content packaging specifications SCORM (Sharable Content Object Reference Model) and IMS Common Cartridge (CC). 2.1. SCORM The Sharable Content Object Reference Model (SCORM) was created to help the re- usability, interoperability, portability, access, maintenance and adaptation of Learning Objects. [10] SCORM is a collection of standards and specifications that enable learning platforms to find, import and deliver learning content in a standardized way. SCORM specifies how Learning Objects must be created in order to ensure interoperability across different platforms and tools. [7] 80 SCORM was originally intended for use in self-study, computer-based training scenarios rather than in interactive scenarios between teachers and students. SCORM’s crucial functionality ‘sequencing’ supports the learner by allowing him/her only to navigate into parts of the package based on the previous learning assignments fulfilled. However, it can be said that SCORM was created to help teachers when transferring collections of Learning Objects, typically from an LOR to an LMS. To date, most popular Learning Management Systems support SCORM objects [11] among these the system of Moodle. SCORM packages can be uploaded as single, unmodifiable entries to Moodle. 2.2. IMS Common Cartridge IMS Common Cartridge (CC) was supposed to enhance SCORM, offering more flexibility and support for assessments, web 2.0 standards, content authorization, collaborative forums and outcomes reporting [7]. The aim of IMS Common Cartridge was not to compete with SCORM but to support to blended learning rather than self learning. According to IMS [12] the IMS Common Cartridge specification’s objectives for facilitating teaching include increasing flexibility, sharing and re-use. CC enables teachers to assemble lesson plans taking only parts of an CC package or integrating the whole package into their LMSs. This study aimed to find out how teachers saw the difference of interoperability when using IMS Common Cartridge packages integrated into a Moodle LMS. IMS Common Cartridge is not as widely supported by different LMSs as SCORM perhaps because there are not yet as many tools to support it. However, LMS Platforms have gained growing interest towards IMS Common Cartridge support. It is still gathering momentum, which should increase by the announcement [13] from the popular open source platform Moodle to start supporting IMS Common Cartridge packages in the spring of 2010 on. Users can upload CC packages as a whole into the Moodle system much in the same way as uploading SCORM packages. 3 Test Setting Adopting Standards and Specifications for Educational Content (ASPECT) is a Best Practice Network for educational content that aims at improving the adoption of learning technology standards and specifications [14]. Standards and interoperability experts produce recommendations that are implemented by tools and content providers before being tested by teachers during school pilots. Tests were carried out in order to demonstrate in which way the implementation of Standards and Specifications leads to greater interoperability and cross-border re-use of content [15]. The Learning Resource Exchange (LRE) is a pan-European federation of Learning Object Repositories [16]. The service is offered to stakeholders providing digital content, such as ministries of education, commercial publishers, broadcasters, cultural institutions, and other non-profit organizations offering online content to schools [17]. The LRE was used as the testing LOR in the ASPECT project and it provided the possibility to get the same package available in all the different formats (web page, 81 SCORM package, CC package, SCORM in Icodeon player and CC in Icodeon platform) in its metadata (see section 3.2 for further information). Our main research aim is the validation of artifacts: the standards SCORM and CC as well as corresponding tools. As these standards are widely implemented in tools and rely upon them in practical experiments, the use of these tools can be considered a valid evaluation of the standards [9]. This paper mainly looks at the design evaluation [18] in an experimental setting [19] and essentially the usability aspects [20]. Both SCORM and IMS Common Cartridge have different versions which support different features. This study focuses on the key feature of compatibility and portability of resources as it can be seen as a critical feature for teachers when re- using learning objects from repositories. 3.1. Validation Group A teacher workshop was organized for a group of 44 mathematics and science teachers equally split, from Portugal, Lithuania, Romania and Belgium. The four countries were chosen randomly among the participating countries to the ASPECT project. Teachers were selected to have the following characteristics: teachers of mathematics, science and/or technology, some experience with ICT and Learning Resources and currently teaching students aged 6 to 21. Before selecting teachers on the basis of these profiles, the strategy for finding volunteers varied from specifically contacting teachers who, in one way or another, had collaborated with the selection team before (e.g., Portugal) to publishing an open call for volunteers (e.g., Lithuania). Initial observations showed that the ICT skills of this group of teachers ranged from little knowledge (a few even unsure of what Google was) to advanced (users of LaTeX ). Up to 80% of the teachers had advanced ICT skills. The group of advanced teachers was strongly represented. The validation results must be seen in this light: Teachers participating in European projects tend to be more motivated towards ICT and improving their teaching methods. These teachers have clearly higher levels of motivation and are very eager to share their knowledge and expertise. As a consequence, it must be taken into account that if these teachers find the project tasks and concepts too difficult or uninteresting, one can be reasonably certain that this will apply even more so to teachers with average levels of ICT competence. 3.2. Tests Description The workshop combined straight-forward assignments and direct feedback gathering from the teachers in the form of interviews and two surveys. The test session was organized in May 2010, concentrated on the integration of Resources into Learning Management Systems and content packaging. In preparation for the tests, teachers had already learned how to create basic courses on Moodle platform. They also had learned how to browse the Learning Object Repository, LRE. In the tests, teachers were initially asked to create a normal lesson plan using the Moodle learning platform in a “traditional” way by combining different Resources. Then they were asked to repeat the same task using a Resource on the same topic that 82 had been ‘packaged’ by ASPECT content developers using both the SCORM and IMS Common Cartridge packages. The test task was to create a simple lesson plan made up of some text, an image, a quiz and a forum, on the topic of thermodynamics for Moodle, using four different approaches: 1) Using non-packaged content 2) Using the entire SCORM package (created from the non-packaged content) 3) Using an entire IMS Common Cartridge package (created from the non- packaged content, with a forum added) 4) Picking up parts from the IMS Common Cartridge package The tests were designed to serve both as a basic training on the use of different types of packaged content and their features (necessary as the teachers had no previous experience with this kind of content) and at the same time obtain their reactions as rewards to usefulness in their everyday teaching, interest and facility to use. Each teacher had an empty Moodle course and editing rights. Each teacher designed the same lesson plan four times as described before. Teachers searched for the resource in question in the LRE, where it was provided in the 3 formats, both to view and download. For the non-packaged lesson plan, teachers had to use Moodle options to create the quiz and the forum themselves. Both the SCORM and Common Cartridge packages had the quiz included in the package, and the latter also contained a forum. Additionally, teachers were presented with a dozen additional Resources packaged as SCORM and Common Cartridge to browse through and see their benefits, independently of the topic of the Resources. In all cases, to view the packaged content, Icodeon’s Common Cartridge Platform and SCORM player were used as the tools to show the packaged content. 4 Results Analysis As the sample size of the survey (n=44) was small, the results of the statistical analysis can only give us some indication on the teachers’ attitudes. The survey results were backed up by a qualitative analysis of interviews of the teachers. Generally, the teachers reacted to SCORM Resources in much the same way that they treated unpackaged content; for example, they did not see much difference between having a SCORM Resource and a PowerPoint presentation. While they saw that a SCORM package could include more than one resource, they did not use it any differently than PowerPoint; both types of content were integrated into an LMS as a single, unmodifiable entity. In comparison, the teachers were very enthusiastic about CC content packaging. After importing a CC package into Moodle, the teachers could remove parts that they did not need, edit the content and change the order of different resources. Many teachers requested instructions on how to adapt Moodle to use CC packages and even some teachers expressed an interest in using CC to package their own content in order to share it with other teachers. Most teachers had little interest in simply viewing and playing SCORM or CC packages. On the other hand, apart from the option to upload the complete packages into an LMS and have the different parts of the package converted into Moodle format, the teachers liked the possibility of being able to embed only parts of a 83 Common Cartridge package in the LMS, or even blogs or websites, which is not possible to do with items from within SCORM packages. While this second option did not take advantage of the LMS's features, teachers liked to have the possibility to only integrate the parts of the cartridge into their courses that they liked or thought were relevant to their lesson. In the survey, • 25% said that taking an entire course in SCORM format and using it in Moodle (or their school's own System) would be extremely helpful • 39% said that taking an entire course in CC format and using it in Moodle (or their school's own System) would be extremely helpful • 43% said that taking a piece of the Learning Resource from one of the CC packages and using it with their other teaching materials would be extremely helpful Most of the teachers who did not see the approaches as extremely helpful, saw that the packages could be useful for them in limited cases like when giving homework or teaching an online course. The survey results indicate that teachers see the interoperability between LORs and LMSs created by the specifications SCORM and CC as useful for their everyday teaching – especially when the packaging allows them to alter the content after it was uploaded or selecting only bits of the package before uploading to LMS. IMS Common Cartridge specifies this interoperability. Part of the objective of this research was not only to find out whether or not the teachers could see content packaging useful, but to find out if they could actually manage working with these standards with the ICT skills that they possess. Our initial hypothesis was that the tools supporting these standards have not yet developed enough to be easy enough for the teachers to use. However, according to the survey, normal web pages were unsurprisingly the easiest to use. 75% of the teachers found using entire SCORM packages really easy or reasonable, whereas they admitted having some problems when using CC packages, whether it was the package as a whole, or taking parts of it. However, interestingly none of the teachers evaluated that any of these methods as impossible to use in their every day teaching (see Fig. 1). 45 % 40 % 35 % 30 % 25 % 20 % 15 % 10 % 5% 0% Really Easy Reasonable Some problems Quite Impossible to use Complicated in everyday teaching Using normal web page Using a SCORM package Using an IMS Common Cartridge package Taking parts of an IMS Common Cartridge package Fig.1. “How easy/difficult was it to create a lesson plan...”. 84 This study indicates that even though SCORM packages might not be the preferred solution for teachers, the longer period of development seems to have helped it to be easier to use than IMS Common Cartridge packages, which was not supported by Moodle before the spring of 2010. There seems to be a need to develop the tools for these standards to be more usable for teachers in the future. However this result would also indicate that the teachers did not like the easiest option best, which would suggest that they were able to look beyond the interfaces of the players into the ideas of interoperability and the standards. Part of testing SCORM and IMS Common Cartridge with users, we looked at the opinions regarding the technical interfaces which show the contents of the packages. Teachers were asked to think of three typical use cases from their everyday teaching life: A) Showing Educational Resources to students in their class rooms B) Teaching an online course C) Giving online homework/extra credit work to the students 60 % 50 % 40 % 30 % 20 % 10 % 0% Web page SCORM Common SCORM IMS Common I think they Player view Cartridge package in Cartridge are basically player view Moodle package in the same and Moodle I don't care which one I use ...you are just showing the materials to the students in your class? ...you are going to teach an entire online course? ...you are giving them online homework/extra credit work? Fig. 2. Interface preferences of teachers in three use scenarios. Overall, Fig. 2 shows that when it comes to just showing Resources in the classroom, that is easiest to do from a normal webpage, or even that it does not matter so much what the interface is. However, when you are creating an online course or giving homework to students, normal web page is no longer enough. Half of the teachers thought that using IMS Common Cartridge packages in Moodle would be the ideal way to teach online or give assignments in the form of homework or extra credit to the students. In the interviews with teachers it became obvious that Learning Management Systems like Moodle are widely used in schools and therefore standards 85 that create interoperability between content and these LMSs, have additional value for teachers. If we want students to study independently, you can give them a SCORM package that they unzip and then use. But if you want, for example, to integrate a package in Moodle, because in Portugal we use Moodle a lot, probably Common Cartridge is good, because we can prepare everything and import it into Moodle." (ICT Teacher from Portugal) “ After the workshop, teachers were asked, what would they prefer to use after learning about the functionalities of SCORM and IMS Common Cartridge packages and interoperability between LORs and LMSs. Most of the teachers ended up preferring IMS Common Cartridge packages (87%), leaving only 7% to prefer normal web pages and 9% to prefer SCORM packages. This result supports the finding that teachers were genuinely excited about the prospects of IMS Common Cartridge after one day of training and lesson plan making. The teachers also seemed to support the solution that was the most adaptable to their own needs rather than the solution that was easiest to use. 5 Conclusions Open educational materials and other web-based resources lead to new opportunities for sharing and re-using content. [21] European teachers are seldom aware of the content packaging standards SCORM and IMS Common Cartridge but understand the added value of re-using Educational Resources in their everyday teaching. In this paper, we have presented the teachers’ view on the interoperability between LORs and LMSs while using SCORM and CC standards. Teachers showed special interest towards CC packages and in particular their use in Moodle system. In detail, the teachers enjoyed the possibility of editing a package, taking some elements and mixing them with their own teaching Resources very much in the same way as they do in with the non-digital Resources in their classrooms. CC supports this kind of interoperability, which is the key finding of this research as it aims to solve one of the biggest problems in the field [22]. In this study, the teachers said that they were more willing to use a solution that would be suitable for them than the solution which seemed the easiest to use. Also no teacher believes that packaging standards are impossible to use in everyday teaching, even though they can see some problems and complications in the process. After a standard is finalized, it takes a long time before tools are developed that actually deliver the functionality to end users in a way that is useful and usable [23]. Hence, for the success of the standard among users, it is crucial to develop CC tools and improve their usability. We also need to recognize that many teachers still struggle to obtain the basic IT skills which are more essential for their day-to-day work even if these teachers were optimistic about content packaging – teachers with lower ICT skills might have a different point of view. In our opinion, training in content packaging standards may be something that remains of interest to a fairly small number of European teachers. However, it is not imperative for the teachers to know that these standards exist. What is vital for them is that the process of lesson preparation using Educational Resources 86 will be smooth and quick. Further development of these standards and the tools around them is the way of assuring re-use of Educational Resources. Acknowledgements This work has been done in the project “ASPECT: Adopting Standards and Specifications for Educational Content” (http://aspect.eun.org/), funded by the European Union, reference number 417008. The authors wish to thank José Moura Carvalho, Pascal Craeye, Delia Oprea, Svetlana Kubilinskiene, Ingo Dahn, Joris Klerkx, Lars Ingesman, Anicet Yalaho and Alenka Kavcic for their help in organizing the workshops and setting up the tests. The authors also wish to acknowledge the present work could not have been carried out without the help and enthusiasm of the 44 teachers from Belgium, Lithuania, Portugal and Romania. References 1. IEEE Learning Technology Standards Commity.: IEEE Standard for Learning Object Metadata. Available at http://ltsc.ieee.org/wg12/ site visited 4.6.2010. 2. WILEY, D.: Connecting Learning Objects to Instructional Design Theory: a Definition, a Metaphor, and a Taxonomy. Utah State University: Digital Learning Environments Research Group, The Edumetrics Institute (2000) 3. McGreal, R.: Learning Objects: A Practical Definition. Int’l J. Instructional Technology and Distance Learning, vol. 1, no. 9, pp. 9 (2004) 4. Mohan, P.: Learning Object Repositories. Proceedings of the Informing Science and IT Education Joint Conference, Flagstaff, Arizona, USA (2005) 5. Dougiamas, M., Taylor, P.: Moodle: Using Learning Communities to Create an Open Source Course Management System. In D. Lassner & C. McNaught (Eds.), Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications, pp. 171-178. Chesapeake, VA: AACE. (2003) 6. Qu, W., Nejdl, W.: Towards Interoperability and Reusability of Learning Resources: a SCORM-conformant Courseware for Computer Science Education. IEEE International Conference on Advanced learning (2002) 7. Gonzalez-Barbone, V., Anido-Rifon, L.: From SCORM to Common Cartridge: A step forward. Computers & Education Volume 54, Issue 1, January 2010, Pages 88-102 (2009) 8. Rey-López, M., Díaz-Redondoa, R., Fernández-Vilasa, A., Pazos-Ariasa, J., García-Duquea, J., Gil-Sollaa, A., Ramos-Cabrera, M: An extension to the ADL SCORM standard to support adaptivity: The t-learning case-study. Computer Standards & Interfaces, Volume 31, Issue 2, February 2009, pp 309-318 (2008) 9. Duval, E.: Learning Technology Standardization: Making Sense of it All. ComSIS Vol 1, No. 1, (2004) 10. ADL Technical Team.: ADL Guidelines for Creating Reusable Content with SCORM 2004 (2008) Available at: http://www.adlnet.org. site visited 4.6.2010. 87 11. Gonzalez-Barbone, V., Anido-Rifon, L.: Creating the first SCORM object. Computers & Education, pp. 1634–1647 (2008) 12. IMS Global Learning Consortium: IMS Common Cartridge Specification. Available at: http://www.imsglobal.org/commoncartridge.html Site visited 7.6.2010. 13. Moodle.: Development: IMS common cartridge. Available at: http://docs.moodle.org/en/Development:IMS_common_cartridge. Site visited 8.6.2010. 14. Massart, D., Chaudron, S., Ayre, J., Klerkx, J., Heckmann, P., Ravet, S., Ingesman, L., Gras-Velázquez, À., Pawlowski J.: D-1.2.1 ASPECT First progress report. ASPECT deliverable. Confidential (2009) 15. Gras-Velázquez, À. Clements, K., Yalaho, A., Ayre J., Ingesman, L., Van Assche, F., Blamire R., Vuorikari, R., De Four, H.: D-6.2 Protocol of Experimentation with National Adaptations. ASPECT deliverable. Confidential (2009) 16. Massart, D.: The EUN Learning Resource Exchange (LRE). The 15th International Conference on Computers in Education (ICCE2007) Supplementary Proceedings, vol. 1, pp. 170-174 (2007) 17. Ternier, S., Massart. D., Campi, A., Guinea, S., Ceri, S., Duval, E.: Interoperability for Searching Learning Object Repositories - The ProLearn Query Language, D-LIB Magazine, vol. 14 (2008) Available at http://dlib.ejournal.ascc.net/dlib/january08/ceri/01ceri.html Site visited 4th June 2010. 18. Hevner, A.R., March, S.T., Park, J., Ram, S.: Design Science in Information Systems Research. MIS Quarterly, vol. 28, 75-105 (2004) 19. Zelkowitz M. V., Wallace, D.: Experimental models for validating computer technology, IEEE Computer, vol. 31, pp. 23-31. (1998) 20. Nielsen, J.: Usability Engineering. Academic Press, Boston (1993) 21. Vuorikari, R.: Insight Special Report: Why Europe Needs Free and Open Source Software and Content in Schools, 2004. 22. Johnstone S.: Open Educational Resources Serve the World: Sharing Educational Resources over the Internet. Educause Quarterly, Vol. 28, No. 3, 2005. 23. Duval, E., & Verbert, K. On the Role of Technical Standards for Learning Technologies. IEEE Transactions on Learning Technologies, vol. 1, no. 4 (2008) 88