=Paper=
{{Paper
|id=None
|storemode=property
|title=Populating Learning Object Repositories with Hidden Internal Quality Information
|pdfUrl=https://ceur-ws.org/Vol-896/paper1.pdf
|volume=Vol-896
|dblpUrl=https://dblp.org/rec/conf/ectel/CechinelCOSA12
}}
==Populating Learning Object Repositories with Hidden Internal Quality Information==
Populating Learning Object Repositories with Hidden
Internal Quality Information
Cristian Cechinel1, Sandro da Silva Camargo1, Xavier Ochoa2, Salvador Sánchez-
Alonso3, Miguel-Ángel Sicilia3
1
Computer Engineering Course
Federal University of Pampa, Caixa Postal 07
96400-970, Bagé (RS), Brazil
contato@cristiancechinel.pro.br, camargo.sandro@gmail.com
2
Escuela Superior Politécnica del Litoral, Campus Gustavo Galindo, Km. 30. Vía Perimetral,
Guayaquil, Ecuador, xavier@cti.espol.edu.ec
3
Information Engineering Research Unit
Computer Science Dept., University of Alcalá
Ctra. Barcelona km. 33.6 – 28871 Alcalá de Henares (Madrid), Spain
salvador.sanchez, msicilia@uah.es
Abstract. It is known that current Learning Object Repositories adopt
strategies for quality assessment of their resources that rely on the impressions
of quality given by the members of the repository community. Although this
strategy can be considered effective at some extent, the number of resources
inside repositories tends to increase more rapidly than the number of
evaluations given by this community, thus leaving several resources of the
repository without any quality assessment. The present work describes the
results of an experiment for automatically generate quality information about
learning resources inside repositories through the use of Artificial Neural
Networks models. We were able to generate models for classifying resources
between good and not-good with accuracies that vary from 50% to 80%
depending on the given subset. The preliminary results found here point out the
feasibility of such approach and can be used as a starting point for the pursuit
of automatically generation of internal quality information about resources
inside repositories.
Keywords: Ranking mechanisms; ratings; learning objects; learning object
repositories; MERLOT; Artificial Neural Networks
1 Introduction
Current Learning Object Repositories (LORs) normally adopt strategies for the
establishment of quality of their resources that rely on the impressions of usage and
RecSysTEL 2012 11
C. Cechinel, S.S. Camargo, X.Ochoa, S. Sánchez-Alonso and M-Á. Sicilia
evaluations given by the members of the repository community (ratings, tags,
comments, likes, lenses). All this information together constitute a collective body of
knowledge that further serves as an external memory that can help other individuals
to find resources according to their individual needs. Inside LORs, this kind of
evaluative metadata (Vuorikari, Manouselis, & Duval, 2008) is also used by search
and retrieval mechanisms for properly ranking and recommending resources to the
community of users of the repository.
Although such strategies can be considered effective at some extent, the amount of
resources inside repositories is rapidly growing every day (Ochoa & Duval, 2009)
and it became impractical to rely only on human effort for such a task. For instance,
on a quick look at the summary of MERLOT’s recent activities, it is possible to
observe that in a short period of one month (from May 21th to June 21th), the
amount of new resources catalogued in the repository was 9 times more than the
amount of new ratings given by experts (peer-reviewers), 6 times more than the
amount of new comments (and users ratings) and 3 times more than the amount of
new bookmarks (personal collections). This situation of leaving many resources of
the current repositories without any measure of quality at all (and consequently
unable or at least on a very disadvantage position to compete for a good place during
the process of search and retrieval) has raised the concern for the development of
new automated techniques and tools that could be used to complement existing
manual approaches. On that direction, Ochoa and Duval (2008) developed a set of
metrics for ranking the results of learning objects search according to three
dimensions of relevance (topical, personal and situational) and by using information
obtained from the learning objects metadata, from the user queries, and from other
external sources such as the records of historical usage of the resources. The authors
contrasted the performance of their approach against the text-based ranking
traditional methods and have found significant improvements in the final ranking
results. Moreover, Sanz-Rodriguez, Dodero, and Sánchez-Alonso (2010) proposed
to integrate several distinct quality indicators of learning objects of MERLOT along
with their usage information into one overall quality indicator that can be used to
facilitate the ranking of learning objects.
These mentioned approaches for automatically measure quality (or calculate
relevance) according to specific dimensions depend on the existence and availability
of metadata attached to the resources (or inside the repositories), or on measures of
popularity about the resources that are obtained only when the resource is publicly
available after a certain period of time. As metadata may be incomplete/inaccurate
and these measures of popularity will be available just for “old” resources, we
propose to apply an alternative approach for this problem. The main idea is to
identify intrinsic measures of the resources (i.e., features that can be calculated
directly from the resources) that are associated to quality and that can be used in the
process of creating models for automated quality assessment. In fact, this approach
was recently tested by Cechinel, Sánchez-Alonso, and García-Barriocanal (2011)
who developed highly-rated profiles of learning objects available in the MERLOT
repository, and have generated Linear Discriminant Analysis (LDA) models based on
13 learning objects intrinsic features. The generated models were able to classify
RecSysTEL 2012 12
Populating Learning Object Repositories with Hidden Internal Quality Information
resources between good and not-good with 72.16% of accuracy, and between good
and poor with 91.49% of accuracy. Among other things, the authors have concluded
that highly-rated learning objects profiles should be developed taking into
consideration the many possible intersections among the different disciplines and
types of materials available in the MERLOT repository, as well as the group of
evaluators who rated the resources (whether they are formed by experts or by the
community of users). For instance, the mentioned models were created for materials
of Simulation type belonging to the discipline of Science & Technology, and
considering the perspective of the peer-reviewers ratings. On an another round of
experiments, Cechinel (2012) also tested the creation of automated models through
the creation of statistical profiles and the further use of data mining classification
algorithms for three distinct subsets of MERLOT materials. On these studies the
author were able to generate models with good overall precision rates (up to 89%)
but the author highlighted that the feasibility of the models will depend on the
specific method used to generate them, the specifics subsets to which they are being
generated for, and the classes of quality included in the dataset. Moreover, the
models were generated by using considerably small datasets (around 90 resources
each), and were evaluated using the training dataset, i.e., the entire dataset was used
for training and for evaluating.
The present work expands the previous works developed by Cechinel (2012) and
Cechinel et al. (2011) by generating and evaluating models for automated quality
assessment of learning objects stored on MERLOT focusing on populating the
repository with hidden internal quality information that can be further used by
ranking mechanisms. On the previous works, the authors explored the creation of
statistical profiles of highly-rated learning objects by contrasting information from
the good and not-good resources and then used these profiles to generate models for
quality assessment. In the present work we are testing a slightly different and more
algorithmic approach, i.e., the models here are being generated exclusively through
the use of data mining algorithms. Moreover, we are also working with a larger
collection of resources and a considerably higher number of MERLOT subsets. The
rest of this paper is structured as follows. Section 2 presents existing research
focused on identifying intrinsic quality features of resources. Section 3 describes the
methodology followed for the study and section 4 discusses the results found.
Finally, conclusions and outlook are provided in Section 5.
2 Background
From our knowledge, besides the recent work of Cechinel et al. (2011), there is still
no empirical evidence of intrinsic metrics that could serve as indicators of quality for
LOs. However, there are some works in adjacent fields which can serve us as a
source of inspiration. For instance, empirical evidence of relations from intrinsic
information and other characteristics of LOs have been found in (Meyer, Hannappel,
Rensing, & Steinmetz, 2007), where the authors developed a model for classifying
the didactic functions of a learning object based on measures about the length of the
RecSysTEL 2012 13
C. Cechinel, S.S. Camargo, X.Ochoa, S. Sánchez-Alonso and M-Á. Sicilia
text, the presence of interactivity and information contained in the HTML code (lists,
forms, input elements). Mendes, Hall, and Harrison (1998) have identified evidence
in some measures to evaluate sustainability and reusability of educational
hypermedia applications, such as, the type of link, and the structure and size of the
application. Blumenstock (2008) has found the length of an article (measured in
words) as a predictor of quality in Wikipedia. Moreover, Stvilia, Twidale, Smith &
Gasser (2005) have been able to automatically discriminate high quality articles
voted by the community of users from the rest of the articles of the collection. In
order to do that, the authors developed profiles by contrasting metrics of articles
featured as best articles by Wikipedia editors against a random set. The metrics were
based on measures of the article edit history (total number of edits, number of
anonymous user edits, for instance) and on the article attributes and surface features
(number of internal broken links, number of internal links, number of images, for
instance). At last, in the field of usability, Ivory and Hearst (2002) have found that
good websites contain (for instance) more words and links than the regular and bad
ones.
Our approach is initially related exclusively to those aspects of learning objects
that are displayed to the users and that are normally associated to the dimensions of
presentation design and interaction usability included in LORI (Nesbit, Belfer, &
Leacock, 2003) and the dimension of information quality (normally mentioned in the
context of educational digital libraries). Precisely, the references for quality
assurance used in here are the ratings given by the peer-reviewers (experts) of the
repository.
3 Methodology
The main objective of this research was to obtain models that could automatically
identify good and not-good learning objects inside repositories based on the intrinsic
features of the resources. The methodology that we followed was the development of
models though the use of data mining algorithms over information of learning objects
catalogued on MERLOT repository. For that, a database was collected from the
repository and qualitative classes of quality of good and not-good were generated
considering the terciles of the ratings of the resources. These classes of quality were
then used as the reference output for the generation of the models.
3.1 Data Collection
A database was collected from MERLOT through the use of a crawlerthat
systematically traversed the pages and collected information related to 35 metrics of
the resources. The decision of choosing MERLOT lays mainly on the fact that
MERLOT has one of the largest amount of registered resources and users, and it
implements a system for quality assurance that works with evaluations given by
experts and users of the repository. Such system can serve as baseline for the creation
of the learning object classes of quality. As MERLOT repository is mainly formed by
RecSysTEL 2012 14
Populating Learning Object Repositories with Hidden Internal Quality Information
learning resources in the form of websites, we evaluated intrinsic metrics that are
supposed to appear in such technical type of material (i.e., link measures, text
measures, graphic measures and site architecture measures). The metrics collected
for this study (see Table 1) are the same as used by Cechinel et al. (2011) and some
of them have also been mentioned in other works which tackled the problem of
assessing quality of resources (previously presented in section 2).
Table 1: Metrics collected for the study
Class of Measure Metric
Number of Links, Number of Uniquea Links, Number of
Internal Linksb, Number of Unique Internal Links,
Link Measures
Number of External Links, Number of Unique External
Links
Text Measures Number of Words, Number of words that are linksc
Number of Images, Total Size of the Images (in bytes),
Graphic, Interactive and Multimedia Number of Scripts, Number of Applets, Number of Audio
Measures Files, Number of Video Files, Number of Multimedia
Files
Size of the Page (in bytes), Number of Files for
Site Architecture Measures
downloading, Total Number of Pages
a
The term Unique stands for “non-repeated”
b
The term internal refers to those links which are located at some directory below the root site
c
For these metrics the average was not computed or does not exist
As resources in MERLOT vary considerably in size, a limit of 2 levels of depth
was established for the crawler, i.e., metrics were computed for the root node (level 0
- the home-page of the resource), as well as for the pages linked by the root node
(level 1), and for the pages linked by the pages of the level 1 (level 21). As it is
shown in table 1, some of the metrics refer to the total sum of the occurrences of a
given attribute considering the whole resource, and other metrics refer to the average
of this sum considering the number of the pages computed. For instance, an object
composed by 3 pages and containing a total of 30 images, will have a total number of
images of 30, and an average number of images equals to 10 (30/3). Information of a
total of 20,582 learning resources was collected. From this amount, only 2,076 were
peer-reviewed, and 5 of them did not have metadata regarding the category of
discipline or the type of material and were disregarded. Considering that many
subsets are formed by very small amount of resources, we restrained our experiment
to just a few of them. Precisely, we worked with 21 subsets formed by the following
types of material: Collection, Reference Material, Simulation and Tutorial, and that
had 40 resources or more2. In total, we worked with information of 1,429 learning
resources which represent 69% of the total collected data. Table 2 presents the
frequency of the materials for each subset used in this study.
1
Although this limitation may affect the results, the process of collecting the information is extremely
slow and such limitation was needed. In order to acquire the sample used in this study, the crawler kept
running uninterruptedly for 4 full months.
2 The difficulties for training, validating and testing predictive models for subsets with less than 40
resources would be more severe.
RecSysTEL 2012 15
C. Cechinel, S.S. Camargo, X.Ochoa, S. Sánchez-Alonso and M-Á. Sicilia
Table 2: Frequency of materials for the subsets used in this study (intersection of category of
discipline and material type)
Material Type/Discipline Arts Business Education Humanities
Collection 52 56 43
Reference Material 83 40 51
Simulation 57 63 40 78
Tutorial 76 73 93
Material Type/Discipline Mathematics Science & Social
and Statistics Technology Sciences
Collection 50 80
Reference Material 68 102
Simulation 40 150
Tutorial 48 86
3.2 Classes of Quality
As the peer-reviewers ratings tend to concentrate above the intermediary rating 3,
classes of quality were created using the terciles of the ratings for each subset3.
Resources with ratings below the first tercile are classified as poor, resources with
ratings equal or higher the first tercile and lower than the second tercile are classified
as average, and resources with ratings equal or higher the second tercile are
classified as good. The classes of quality average and poor were then joined in
another class called not-good.
3.3 Mining models for automated quality classification of learning objects
The classes of quality were used as the output reference for generating and testing
models for automated quality assessment of the resources through the use of
Artificial Neural Networks (ANNs). The choice of using ANNs rests on the fact that
they are adaptive, distributed, and highly parallel systems which have been used in
many knowledge areas and have proven to solve problems that require pattern
recognition (Bishop, 2006). Moreover, ANNs are among the types of models that
have also shown good accuracies on the previous works mentioned before. Finally,
we have initially tested other approaches (with rules and trees) and they presented
maximum accuracies around 60%. As ANNs presented the best preliminary results
we selected this approach for the present study.
The experiments were conducted with the Neural Network toolbox of Matlab.
For each subset we randomly selected 70% of the data for training, 15% for testing
and 15% for validation, as suggested by Xu, Hoos, and Leyton-Brown (2007). We
tested the Marquardt –Levenberg algorithm (Hagan & Menhaj, 1994) using from 1 to
30 neurons in all tests. In order to obtain more statistically significant results (due to
the small size of the data samples), each test was repeated 10 times and the average
results were computed. The models were generated to classify resources between
good and not-good.
3
The terciles of each subset were omitted from the paper due to a lack of space
RecSysTEL 2012 16
Populating Learning Object Repositories with Hidden Internal Quality Information
4 Results and Discussion
The models presented different results depending on the subset used for training.
Most of the models tend to classify not-good resources better than good ones which
can probably be a result of the uneven amount of resources of each class inside the
datasets (normally formed by 2/3 of not-good and 1/3 of good). These tendencies
can be observed in figure 24.
The number of neurons used on the construction of the models has different
influences depending on the subsets. A Spearman’s rank correlation (rs) analysis was
carried out to evaluate whether there are associations between the number of neurons
and the accuracies achieved by the models. This test serves to the purpose of
observing the pattern expressed by the models on predicting quality for the given
subsets. For instance, assuming x as a predictive model for a given subset A, and y as
a predictive model for a given subset B; if x has less neurons than y and both have the
same accuracies, the patterns expressed in A are simpler than the ones expressed in B.
This means to say that it is easier to understand what is good (or not-good) in the
subset A. Table 3 shows the results of such analysis.
In Table 3 (-) stands for no association between the number of neurons and the
accuracy of the model for classifying a given class, () stands for a positive
association, and () stands for a negative association. The analyses considered a 95%
level of significance. As it can be seen in the table, the number of neurons influences
on the accuracies for some classes of quality of some subsets. For instance, the
number of neurons presents a positive association with the accuracies for classifying
good resources in the 6 (six) following subsets: Business Simulation, Business
Tutorial, Education Collection, Education Tutorial, Humanities Tutorial, and
Science & Technology Simulation. Moreover, the number of neurons presents a
negative association with the accuracies for classifying not-good resources in the 8
(eight) following subsets: Arts Simulation, Business Tutorial, Education
Collection, Education Simulation, Education Tutorial, Education Humanities,
Science & Technology Simulation, and Science & Technology Tutorial. Finally,
there are no positive associations between the number of neurons and the accuracies
for classifying not-good resources; neither there are negative associations between
the number of neurons and the accuracies for classifying good resources.
4
Just some models were presented in the figure due to a lack of space
RecSysTEL 2012 17
C. Cechinel, S.S. Camargo, X.Ochoa, S. Sánchez-Alonso and M-Á. Sicilia
Fig.2. Accuracies of the some models versus number of neurons. Overall accuracies
(lozenges), accuracies for the classification of good resources (squares) and not-good
resources (triangles)
Table 3: Tendencies of the accuracies according to the number of neurons used for training
(good | not-good)
Subset Arts Business Education Humanities Math & Science
Statistics & Tech
Collection -|- | -|- -|- -|-
Reference Material -|- -|- -| -|- -|-
Simulation -| |- -| -|- -|- |
Tutorial | | |- -|- -|
In order to evaluate how to select the best models for quality assessment, it is
necessary to understand the behavior of the models for classifying both classes of
quality included on the datasets. Considering that, a Spearman’s rank correlation (rs)
analysis was also carried out to evaluate whether there are associations between the
accuracies of the models for classifying good and not-good resources. Such analysis
serves to evaluate the trade-offs of selecting or not a given model for the present
purpose. Most of the models have presented strong negative correlations between the
accuracies for classifying good and not-good resources. The results of both analyses
suggest that the decision of selecting a model for predicting quality must take into
account that, as the accuracy for classifying resources from one class increases, the
accuracy for classifying resources of the other class decreases. Considering that, the
question lies on establishing which would be the cutting point for acceptable
accuracies so that the models could be used for our purpose. In other words, it is
necessary to establish the minimum accuracies (cutting point) that the models must
present for classifying both classes (good and not-good) so that they can be used for
generating hidden quality information for the repository.
For the present study, we are considering that the models must present accuracies
higher than 50% for the correct classification of good and not-good resources
(simultaneously) in order to be considered as useful. It is known that the decision of
selecting the minimum accuracies for considering a model as efficient or not will
depend on the specific scenario/problem for which the models are being developed
for. Here we are considering that accuracies higher than 50% are better than the
merely random.
RecSysTEL 2012 18
Populating Learning Object Repositories with Hidden Internal Quality Information
Table 4 presents the top-2 models for each subset considering their overall
accuracies, and their accuracies for classifying good and not-good resources (ordered
by the accuracy for classifying good resources).
Table 4: Two best models for each subset (ordered by the accuracies for classifying good
resources)
Subset N OA G NG Subset N OA G NG
16 0,65 0,61 0,70 Business 11 0,56 0,61 0,60
Arts Simulation
25 0,55 0,56 0,54 Collection 25 0,57 0,60 0,59
Business 8 0,58 0,54 0,59 Business 24 0,64 0,67 0,60
Reference 5 0,59 0,53 0,68 Simulation 30 0,57 0,62 0,55
Business 23 0,61 0,40 0,72 Education 26 0,51 0,6 0,49
Tutorial 29 0,59 0,38 0,71 Collection 29 0,51 0,6 0,44
Education 16 0,60 0,63 0,70 Education 20 0,52 0,62 0,5
Reference 20 0,58 0,54 0,71 Simulation 12 0,53 0,59 0,56
Education 27 0,47 0,49 0,47 Humanities 14 0,6 0,75 0,51
Tutorial 29 0,53 0,43 0,61 Collection 19 0,63 0,69 0,68
Humanities 29 0,47 0,59 0,49 Humanities 4 0,69 0,76 0,69
Reference Mat. 10 0,58 0,5 0,65 Simulation 9 0,79 0,75 0,79
Humanities 25 0,56 0,60 0,58 Math.& Statistics 28 0,5 0,61 0,54
Tutorial 21 0,51 0,59 0,54 Collection 27 0,49 0,57 0,46
Math. 22 0,63 0,54 0,72 Math.& Statistics 14 0,81 0,63 0,93
Reference Mat. 18 0,53 0,48 0,60 Simulation 3 0,88 0,57 1
Mathematics 26 0,69 0,79 0,64 Science & Tech. 17 0,58 0,60 0,54
Tutorial 25 0,70 0,77 0,61 Collection 3 0,56 0,54 0,60
Science & Tech. 19 0,59 0,63 0,56 Science & Tech. 29 0,57 0,58 0,61
Reference Mat. 16 0,55 0,58 0,58 Simulation 19 0,58 0,52 0,62
Science & Tech. 28 0,64 0,50 0,72
Tutorial 14 0,56 0,45 0,61
In table 4, N stands for the number of neurons in the model, OA stands for the
overall accuracy, G for the accuracy for classifying good resources and NG for the
accuracy for classifying not-good resources. As it can be seen in the table, and
considering the established minimum cutting-point, it was possible to generate
models for almost all subsets. From the 42 models presented in the table, only 10 did
not reach the minimum accuracies (white in the table). Moreover, 22 of them
presented accuracies between 50% and 59.90% (gray hashed in the table), and 9
presented both accuracies higher than 60% (black hashed in the table). We have also
found 1 (one) model with accuracies higher than 70% (for Humanities Simulation).
The only three subsets to which the models did not reach the minimum accuracies
were: Business Tutorial, Education Collection and Education Tutorial. On the
other hand, the best results were found for: Humanities Simulation, Mathematics
Tutorial, Humanities Collection, Business Simulation, Arts Simulation and
RecSysTEL 2012 19
C. Cechinel, S.S. Camargo, X.Ochoa, S. Sánchez-Alonso and M-Á. Sicilia
Business Collection. One of the possible reasons why it was not feasible to
generate good models for all subsets may rest on the fact that the real features
associated to quality on those given subsets might not have been collected by the
crawler.
In order to select the most suitable model one should take into consideration that
the model’s output is going to be used as information during the ranking process, and
to evaluate the advantages and drawbacks of a lower accuracy for classifying good
resources in contraposition to a lower accuracy for classifying not-good resources.
The less damaging situation seems to occur when the model classify as not-good a
good material. In this case, good materials would just remain hidden in the
repository, i.e., in bad ranked positions (a similar situation to the one of not using the
models). On the other hand, if the model classifies as good a resource that is not-
good, it is most likely that this resource will be put at a higher rank position, thus
increasing its chances of being accessed by the users. This would mislead the user
towards the selection of a “not-so-good” quality resource, and it could put in
discredit the ranking mechanism.
5 Conclusions and Outlook
It is known that LORs normally use evaluative information to rank resources
during the process of search and retrieval. However, the amount of resources inside
LORs increases more rapidly than the number of contributions given by the
community of users and experts. Because of that, many LOs that do not have any
quality evaluation receive bad rank positions even if they are of high-quality, thus
remaining unused (or unseen) inside the repository until someone decides to evaluate
it. The models developed here could be used to provide internal quality information
for those LOs still not evaluated, thus helping the repository in the stage of offering
resources. Among other good results, one can mention the model for Humanities
Simulation that is able to classify good resources with 75% of precision and not-good
resources with 79%; and the model developed for Mathematics Tutorial with 79%
of precision for classifying good resources and 64% for classifying not-good ones.
As the models would be used inside repository and the classifications would serve
just as input information for searching mechanisms, it is not necessarily required that
the models provide explanations about their reasoning. Models constituted of neural
networks (as the one tested in the present study) can perfectly be used in such a
scenario.
Resources recently added to the repository would be highly benefited by such
models since that they hardly receive any assessment just after their inclusion. Once
the resource finally receives a formal evaluation from the community of the
repository, the initial implicit quality information provided by the model could be
disregarded. Moreover, this “real” rating could be used as feedback information so
that the efficiency of the models could be analyzed, i.e. to evaluate whether or not the
users agree with the models decisions.
RecSysTEL 2012 20
Populating Learning Object Repositories with Hidden Internal Quality Information
Future work will try to include more metrics still not implemented, such as, for
instance, the number of colors and different font styles, the existence of adds, the
number of redundant and broken links, and some readability measures (e.g. Gunning
Fog index and Flesch-Kincaid grade level). Besides, as pointed out by Cechinel and
Sánchez-Alonso (2011), both communities of evaluators in MERLOT (users and
peer-reviewers) are communicating different views regarding the quality of the
learning objects refereed in the repository. The models tested here are related to the
perspective of quality given by peer-reviewers. Future work will test models created
with the ratings given by the community of users and compare their performances
with the present study. Moreover, as the present work is context sensitive, it is
important to evaluate whether this approach can be extended to other repositories. As
not all repositories adopt the same kind of quality assurance that MERLOT does,
alternative quality measures for contrasting classes between good and not-good
resources must be found. Another interesting possible direction is to classify learning
resources according to their granularity, and use this information as one of the
metrics to be evaluated during the creation of the highly-rated profiles. At last, we
could use the values calculated by the models for all the resources and compare the
ranking of MERLOT with the ranking performed through the use of these “artificial”
quality information.
It is important to mention that the present approach does not intend to replace
traditional evaluation methods, but complement them providing a useful and
inexpensive quality assessment that can be used by the repositories before more time
and effort consuming evaluation is performed.
Acknowledgments
The work presented here has been funded by the European Commission through
the project IGUAL (www.igualproject.org) – Innovation for Equality in Latin
American University (code DCIALA/19.09.01/10/21526/245-315/ALFAHI
(2010)123) of the ALFA III Programme, and by Spanish Ministry of Science and
Innovation through project MAVSEL: Mining, data analysis and visualization based
in social aspects of e-learning (code TIN2010-21715-C02-01).
References
Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning:
Springer.
Blumenstock, Joshua E. (2008). Size matters: word count as a measure of quality on
wikipedia. Paper presented at the Proceedings of the 17th international
conference on World Wide Web, Beijing, China.
Cechinel, Cristian. (2012). Empirical Foundations for Automated Quality Assessment
of Learning Objects inside Repositories. (Ph.D. Doctoral Thesis), University
of Alcalá, Alcalá de Henares.
RecSysTEL 2012 21
C. Cechinel, S.S. Camargo, X.Ochoa, S. Sánchez-Alonso and M-Á. Sicilia
Cechinel, Cristian, & Sánchez-Alonso, Salvador. (2011). Analyzing Associations
between the Different Ratings Dimensions of the MERLOT Repository.
Interdisciplinary Journal of E-Learning and Learning Objects 7, 1-9.
Cechinel, Cristian, Sánchez-Alonso, Salvador, & García-Barriocanal, Elena. (2011).
Statistical profiles of highly-rated learning objects. Computers & Education,
57(1), 1255-1269. doi: 10.1016/j.compedu.2011.01.012
Hagan, M. T., & Menhaj, M. B. (1994). Training feedforward networks with the
Marquardt algorithm. Neural Networks, IEEE Transactions on, 5(6), 989-
993. doi: 10.1109/72.329697
Ivory, M. Y., & Hearst, M. A. (2002). Statistical profiles of highly-rated web sites.
Paper presented at the Proceedings of the SIGCHI conference on Human
factors in computing systems: Changing our world, changing ourselves,
Minneapolis, Minnesota, USA.
Mendes, Emilia, Hall, Wendy, & Harrison, Rachel. (1998). Applying Metrics to the
Evaluation of Educational Hypermedia Applications. Journal of Universal
Computer Science, 4(4), 382-403. doi: 10.3217/jucs-004-04-0382
Meyer, Marek, Hannappel, Alexander, Rensing, Christoph, & Steinmetz, Ralf.
(2007). Automatic classification of didactic functions of e-learning
resources. Paper presented at the Proceedings of the 15th international
conference on Multimedia, Augsburg, Germany.
Nesbit, John C., Belfer, Karen, & Leacock, Tracey. (2003). Learning object review
instrument (LORI). E-learning research and assessment network. Retrieved
from http://www.elera.net/eLera/Home/Articles/LORI%20manual.
Ochoa, Xavier, & Duval, Erik. (2008). Relevance Ranking Metrics for Learning
Objects. Learning Technologies, IEEE Transactions on, 1(1), 34-48. doi:
http://dx.doi.org/10.1109/TLT.2008.1
Ochoa, Xavier, & Duval, Erik. (2009). Quantitative Analysis of Learning Object
Repositories. Learning Technologies, IEEE Transactions on, 2(3), 226-238.
Sanz-Rodriguez, Javier, Dodero, Juan, & Sánchez-Alonso, Salvador. (2010).
Ranking Learning Objects through Integration of Different Quality
Indicators. IEEE Transactions on Learning Technologies, 3(4), 358 - 363.
doi: 10.1109/TLT.2010.23
Stvilia, B., Twidale, M. B., Smith, L. C., & Gasser, L. (2005). Assessing information
quality of a community-based encyclopedia. Paper presented at the
Proceedings of the International Conference on Information Quality - ICIQ
2005.
Vuorikari, Riina, Manouselis, Nikos , & Duval, Erik. (2008). Using Metadata for
Storing, Sharing and Reusing Evaluations for Social Recommendations: the
Case of Learning Resources. Social Information Retrieval Systems:
Emerging Technologies and Applications for Searching the Web Effectively.
Hershey, PA: Idea Group Publishing, 87–107.
Xu, Lin, Hoos, Holger H., & Leyton-Brown, Kevin. (2007). Hierarchical hardness
models for SAT. Paper presented at the Proceedings of the 13th international
conference on Principles and practice of constraint programming,
Providence, RI, USA.
RecSysTEL 2012 22