Outcome-oriented Fitness Measurement of Personal
                  Learning Environments
                                               Felix Mödritscher
                            Institute for Information Systems and New Media,
                              Vienna University of Economics and Business
                                    Augasse 2-6, 1090 Vienna, Austria
                                              +43-1-31336-5277
                                        felix.moedritscher@wu.ac.at

ABSTRACT                                                        application of new, PLE-related technologies (like apps,
Personal learning environments (PLEs) comprise a new            widgets or gadgets) and their underlying infrastructures
kind of learning technology which aims at putting               (widget containers, personalized websites, mobile phones
learners into centre stage, i.e. by empowering them to          etc).
design and use environments for their learning needs and        Considering the spreading of these technologies in
purposes. While a lot of research and development is            society and the raising profits of leading companies in
going on in realizing and providing technical PLE               this sector (e.g. Apple or Google), they are highly
solutions, less effort is spent in examining the ‘fitness’ of   successful. However less attention is paid to their usage
PLEs. By fitness we refer to the property of a PLE that it      as personal learning environments and their (positive and
is successfully used to achieve a goal. In this paper we        negative!) effects on lifelong learning. In order to
attempt to formalize the PLE fitness by focusing on one         formalize and examine the evolvability of PLEs, we build
specific aspect, namely on outcomes of PLE-based                upon the notion of fitness, a concept given by
activities. For this purpose, we analyze a certain kind of      evolutionary theory. By comparing the development,
PLE outcomes, i.e. publications, by measuring their             spreading, and utilization of PLEs – the technical
impact and use real-world data harvested in the Web to          infrastructures as well as their entities, e.g. tools and their
propose a mathematical fitness model. Furthermore, we           features – to genetic evolution [3], a learning
address factors characterizing the fitness of a publication     environment can be understood as a socio-technical
as well as preliminaries of our approach. The paper             system (organism) with its functionalities (traits).
concludes with pointing out related findings from other         According to our initial definition, a PLE is a set of tools,
fields and possible future work on outcome-oriented PLE         services, artifacts, and peer actors, thus the fitness of a
fitness measurement.                                            PLE refers to specific situations in which it is used and
Categories and Subject Descriptors                              consequently to defined purposes (fit-for-purpose) as
G.3 [Mathematics of Computing]: Probability and Statistics:     well as to the scope of a community and a context (local
Distribution functions, Time series analysis, H.2.8             fitness).
[Information Systems]: Database Applications: scientific
                                                                Over time, PLEs can evolve, for instance specialize,
databases,   G.1.2    [Mathematics      of   Computing]:
Approximation: Nonlinear approximation.
                                                                according to situations in which certain features are used
                                                                more frequently and others are ignored or even removed
General Terms                                                   – as learners also demand new features, developers are
Algorithms, Measurement, Experimentation.                       part of this evolutionary process and implement them so
                                                                that a PLE solution is being used in the future. Such
Keywords                                                        processes bear a resemblance to the concept of natural
Personal Learning Environments, Scientific Publications,        selection [4]. In the context of this paper, fitness refers to
Citation History Analysis, Fitness Function, Gamma
                                                                a property describing PLE functionalities. Fitter PLE
Distribution.
                                                                features (genes) become more common, i.e. a certain
1. INTRODUCTION                                                 form of a feature (allele; DNA sequence) is used more
According to Henri et al. [1], personal learning                frequently, spreads faster, or can even substitute other
environments (PLEs) refer to “a set of learning tools,          forms of the same functionality.
services, and artifacts gathered from various contexts to       We explain these definitions through two examples for
be used by the learners”. Furthermore Van Harmelen [2]          the evolution of software artifacts in praxis. A first
states that PLEs aim at empowering learners to design           example comprises a new way of providing
(ICT-based) environments for their activities so that they      recommendations. In the last few years many web
can connect to learner networks in order to collaborate on      applications have included recommendations which
shared outcomes and acquire necessary (professional and         appear on typing in a term into the search field.
rich professional) competences. In the last years a lot of      Restricting these recommendations to the user’s context
work has been investigated in the development and               (e.g. Facebook.com) or auto-completing the query on the


                                                                                                                       18
basis of terms given by many other users (e.g.
Google.com) seem to be two manifestations of this
feature which will become more important in the future.
So, the generic function “recommendations” has been
specialized over time. In a second example a new
researcher enters a scientific community on statistical
mathematics. In this group of researchers a specific tool,
namely the R software, is favored for teaching and
research activities. Thus the new member is facing a tool
with a high fitness factor within the community and can
either work with this tool or try to establish some other
software in this community, consequently opposing the R
framework.
Overall, the idea of our approach is to consider PLEs as
the outcomes of (collaborative, ICT-based) learning –                 Figure 1. Example scenario for PLE-based
which is also stated e.g. by Wild et al. [5] – and to                              collaboration.
formalize and examine their evolution over several              On a theoretical level and putting the learner (actor)
generations. Unfortunately this would require detailed          central stage, Klamma and Petrushyna [7] propose a
data about PLE-based activities over a long period of           model of learning ecologies which is based on the Actor-
time – which is not easy to get and which we do not have.       Network Theory (ANT) and describes five important
Therefore we propose to focus on certain aspects of PLE         entities of a PLE:
activities, namely on PLE outcomes in the form of
                                                                 Processes: Activities carried out for educational
scientific papers. We use the information on publications
                                                                 reasons, at workplace, or due to personal goals (e.g. a
to model and analyze their fitness with respect to their
                                                                 job task in a business process, attending a course for
scientific impact.
                                                                 further education, or a spare time activity requiring the
The rest of the paper is structured as follows. The next         acquisition of new competences)
section elaborates our approach towards outcome-
                                                                 Media: Collection of learning resources required for
oriented fitness measurement as well as preliminaries and
                                                                 or created in these activities (e.g. the Wikipedia
related work. Then, section 3 describes the stepwise
                                                                 platform, learning objects repository, or simply the
development of a fitness function for PLE outcomes and
                                                                 Internet)
examines different characteristics of this model. Section
4 summarizes findings as well as similarities to other           Artifacts: Documents and other (digital or real-
fields, and discusses the approach towards its relevance         world) artifacts collaboratively created and accessed by
for the PLE fitness, before an outlook on future work is         learners (e.g. Wiki articles or a joint paper)
given.
                                                                 Agents: Actors, no matter if humans or software
2. CONCEPTUAL APPROACH,                                          (e.g. peer learners or functionality provided by
PRELIMINARIES, AND RELATED                                       software)
WORK                                                             Communities:         People     sharing     the    same
As mentioned before, we consider scientific papers as typical    environment, e.g. in terms of having common interests,
PLE outcomes and use bibliographic data to examine and           working on the same artifacts, being connected to the
formalize their fitness. In a first step we have to clarify      same actors (e.g. a group of learners trying to achieve a
how publications and PLEs are related. In former                 course goal or a special interest group for a specific
research we have elaborated the notion and the most              topic)
important concepts of PLE-based learning ecologies [6].
                                                                In the scope of this paper, the PLE related to a
Figure 1 shows what PLE-based collaboration looks like.
                                                                publication can be described as follows. A scientific
Learners are involved into different activities in which
                                                                publication is an outcome of a PLE-based activity which
they try to achieve personal and group goals (e.g.
                                                                involves several human agents in different roles (main
publishing a paper to a journal). They use various tools to
                                                                author, co-authors, organizer/editor, reviewers, etc.) and
collaborate on shared artifacts. In the context of this
                                                                using      different   tools     (MS     Word,      email,
paper, publications can be seen as typical outcomes of
                                                                conference/journal submission system, etc.). The whole
such activities, as they are created by one or more
                                                                publication process consists of various different
scientists using different tools – and even single-authored
                                                                activities, e.g. research, writing, and submission
papers normally involve other actors in the background.
                                                                activities. Normally, a paper also addresses one or a few
                                                                scientific communities which can be determined by the
                                                                targeted journal or conference.
                                                                Realistically the PLE of a publication cannot be fully
                                                                reconstructed any more, as the tools used and the


                                                                                                                   19
interaction sequences were not tracked sufficiently. Thus,      of our selected publications. On the other hand, the ISI
we examine the fitness (success) of papers towards their        Web of Knowledge and the ACM Digital Library
impact in scientific communities by analyzing the               provides bibliographic data on a good quality level but
number of citations of different kind of publications over      the coverage seems to be poor. Mendeley is not a real
time. The analysis of citations and the citation history of     citation index, as it rather contains usage data (no.
papers is a well-explored field (cf. [8]). Furthermore          readers) than citations. Yet, this data is interesting and
shortcomings of citation analysis, like biased citing,          valuable for our evaluation. In sum, we decided to use
secondary sources, variations in citation rates with            Google Scholar which contains significantly more and
disciplines or nationalities, and many more, are                topical data-sets. Moreover, the quality of this data is on
elaborated extensively [8, 9]. Yet, we consider these           a reasonable level, which is also backed up by other
problems of citation analysis (similarly to the learning        evaluation studies, e.g. one on citation mining [11].
environment itself) as part of the outcome of PLE-based
                                                                With respect to [12], citing a research paper follows the
activities, being worth an in-depth analysis.
                                                                Poisson process, a stochastic process in which citations
With respect to existing citation indices like CiteseerX        occur continuously and independently of each other.
(http://citeseerx.ist.psu.edu/), the ISI Web of Knowledge       More precisely, the citation curve of a publication can be
(http://www.isiwebofknowledge.com/), or the ACM                 formalized by the convolution of two Poisson
Digital Library (http://portal.acm.org/), new tools such as     distributions, one describing the initial phase of a paper’s
Google Scholar (see http://scholar.google.com/) or              uptake and another one representing its continuous aging
community        approaches        like   Mendeley       (see   process. As a simplification and to combine the two
http://www.mendeley.com/) provide new opportunities             citation curves into one model, we propose to use the
for citation analysis on the basis of large and topical data-   Gamma distribution to formalize the fitness of a paper
sets (cf. upcoming section and [10]).                           according to its citations. The probability density
                                                                function of a Gamma distribution is defined as follows
In the following we describe the development of an
                                                                [13]:
approach for formalizing the fitness (citation success) of
papers and discuss characteristics of this fitness model.
3. MEASURING AND FORMALIZING
THE FITNESS OF SCIENTIFIC PAPERS
First of all, we had to decide on the data source for the       Different to former research which is based upon the
bibliographic data required for our approach. After             Avramescu function [12] – a specialization of the Erlang
inspecting possible platforms (CiteseerX, ISI Web of            distribution which itself is a special kind of Gamma
Knowledge, ACM Digital Library, Google Scholar, and             distribution –, we use the Gamma distribution for
Mendeley) we conducted a small evaluation study.                formalizing the fitness of a paper, as it allows
Therefore, we selected four prominent (i.e. highly cited)       approximating the citation curve according to two
publications for this brief evaluation, a well-known book       parameters, the shape (k) and the scale (θ). Given the
on data mining and papers on booming topics in the Web          number of citations per year retrieved from Google
(Semantic Web and the PageRank algorithm).                      Scholar, we use the citation history of prominent papers
                                                                to develop a method for estimating these two parameters.
Table 1. Comparison of different citation indices (CiteseerX
[CX], ISI Web of Knowledge [WoK], ACM Digital Library           Figure 2 displays the citation curves of the four papers
 [ACM], Google Scholar [GS], and Mendeley [M]) on the           analyzed in Table 1. All of these publications are well
basis of four highly cited papers and retrieved on February     cited and have sufficient data starting in the years 1998,
 8, 2011 (*) no. citations given by Scholar vs. sum of yearly   2001, and 2006. The book on data mining (green curve)
                            citations,
                         +)                                     is problematic, as it is the second edition and thus the
                            no. readers)
                                                                citation history seems to be biased. However, the other
 Publication                         AC                         three papers deal with important innovations in the field
                   CX      WoK                GS*)      M+)
    on:                              M                          of computer science and are considered to be appropriate
                                             10700              for developing a method for measuring the fitness of PLE
 Data mining       n.a.     n.a.     n.a.                61
                                             /6035              outcomes.
                                             10709
Semantic Web       n.a.    1159      n.a.               323
                                             /8312
                                             3670/
PageRank (1)      1301      n.a.     n.a.                44
                                              2949
                                             7245/
PageRank (2)      2140      n.a.    1534                573
                                              5917

In Table 1 the comparison of different citation indices is
shown. Overall, this statistic confirms the impressions of
our inspection. For instance, the data quality of CiteseerX
seems to be very poor, as it has no or faulty data on two


                                                                                                                     20
                                                               In a second step, we used (n-2) values of our citation
                                                               history for estimating the two parameters so that the error
                                                               rate is minimal. It is recommended to not use the citation
                                                               data of the last two years (here 2010 and 2011) because
                                                               of publication and indexing delays, thus the number of
                                                               citations is incomplete. Given the mode, we have a
                                                               written a R function which numerically calculates the
                                                               best values for k and θ by means of minimizing the error
                                                               rate of the first m values of the citation history (with m
                                                               being number of values to the mode           ) according to
                                                               the following equation:


                                                               After calculating the parameters (e.g. k = 5.042 and θ =
Figure 2. Citation curves of the four publications mentioned   2.968827 for the selected PageRank paper), the third step
    in Table 1 (data-sets taken from Google Scholar on         comprises evaluation (the relative error for these
  February 8, 2011; green curve: data mining book, blue
                                                               parameters is 7.85%) and a visualization of the
   curve: Semantic Web paper, red and black curve: two
                    papers on PageRank).                       approximated curve. Figure 3 shows the number of
                                                               citations gathered from Google Scholar and the
For developing our method to approximate the citation          approximation according to the Gamma distribution.
history according to a Gamma distribution, we used the
second paper on PageRank (S. Brin and L. Page, “The
anatomy of a large-scale hypertextual Web search
engine”, 1998) because sufficient data is provided over a
long period of time (see red curve in Figure 2). Basically,
our fitness measurement method consists of three steps to
approximate a given citation history: (1) determination of
the mode, i.e. the value that occurs most frequently in the
data-set; (2) parameter estimation of the shape and the
scale with respect to minimizing the error rate of the
given sample according to the probability density
function (pdf) of the Gamma distribution; (3)
visualization and evaluation of the approximated fitness
curve.
The first step, the identification of the mode, is the one
which is the trickiest and highly restricts our approach
but it is also necessary. As we have only data-sets of the
first years after publications appear, we decided to select
the mode manually due to two facts. On the one hand,            Figure 3. Gamma approximation for PageRank (2) paper
distribution fitting algorithms are based on the                  from Figure 1 (x is the time axis starting with 1 as the
preliminary that the values are distributed over time –           publication year; red curve describes the Gamma pdf
which is not the case for our data. Existing software, like          approximated according to the citation history).
the open source framework for statistical computing and        In principle, we now can formalize the fitness of a PLE
graphics (R Project, see http://cran.r-project.org/),          outcome by two numbers, the shape and the scale of the
provide packages for estimating the parameters of              Gamma pdf. If based on sufficient data, this distribution
Gamma distributions (cf. [14]), but they do not lead to        of a publication’s citation history seems to be reasonable,
useful results for our data. On the other hand, we have to     as it starts to have impact after being published, reaches a
assume that the mode is already included within the data-      peak some years in the future and then decreases again.
set available, which is also a necessary condition for our     The last phase can be argued by effects like more
approximation method.                                          successful follow-up publications or aging of published
However, having the mode of the distribution gives us          knowledge. Overall, this fitness measurement enables
the possibility to estimate the two parameters (shape k,       comparing the success (impact) of publications to each
scale θ) on the basis on the following mathematical            other.
relationship (setting first derivation of pdf to 0):           In the next step we analyzed the fitness of different
                                                               publications: (a) the most frequently cited papers, i.e.


                                                                                                                    21
fundamental literature of a selected scientific community,        dependent on the community, we restrict the comparison
(b) a successful follow-up paper by a lead researcher, (c,        of Gamma parameters to this scientific field. Thus, the
d) average (less successful) papers of the same author            shape calculated for the PageRank paper (Web
(single-authored and co-authored papers), and (e) the             researcher) cannot be set in direct relation with the shape
mostly cited paper of other researchers in a selected field.      factors of the AH papers.
We used the bibliographic data of the adaptive
                                                                  Next to the speed of a paper’s uptake, success can be also
hypermedia (AH) community, as this discipline is very
                                                                  determined by the number of citations in general. Here,
young and most of the key publications are captured by
the index of Google Scholar.                                      both scaling factors, the Gamma parameter θ (second
                                                                  column of Table 2) as well as the factor to normalize the
  Table 2. Comparison of selected papers according to our         citation history to the pdf of the Gamma distribution
   fitness estimation method (data retrieved from Google          (third column), allow inferences on the quantity of
                Scholar on February 23, 2011)                     citations. The first two papers are cited significantly more
                                                norm.      rel.   often than the papers 3 and 6 which in turn are more
        Publication:             k       θ
                                                factor    error
                                                                  successful than the publications 4 and 5. However, both
  1. Brusilovsky, “Methods
 and techniques of adaptive                                       scaling factors dependent on the shape k that is why the
                                3.105   4.751   2336.03   12.54   fitness function of the first paper has a higher scale and a
      hypermedia”, 1996
 [1373 citations, 16 values]                                      higher normalization factor but a lower peak.
  2. Brusilovsky, “Adaptive
      hypermedia”, 2001         2.993   3.010   2043.25   9.68
 [1274 citations, 11 values]
 3. Brusilovsky et al., “From
 adaptive hypermedia to the
                                3.347   2.983   486.46    15.82
adaptive web”, CACM, 2001
  [303 citations, 11 values]
   4. De Bra, Brusilovsky,
“Adaptive hypermedia: from
   systems to framework”,       3.724   2.937   275.57    15.60
             1999
  [159 citations, 13 values]
  5. Brusilovsky, “Adaptive
 educational systems on the
                                6.648   1.062   141.29    24.92
   world-wide-web”, 1998
  [174 citations, 14 values]
 6. De Bra et al., “AHAM: a
    Dexter-based reference
      model for adaptive        3.372   2.530   394.38    22.73
      hypermedia”, 1999
  [326 citations, 13 values]


Table 2 gives an overview of the comparison of papers
being relevant for the assumptions (a-e). A first                  Figure 4. Fitness functions and citation histories (from the
observation deals with the relative error of the                  publication years to 2009) of the papers depicted in Table 2
                                                                  (colors: 1. black, 2. red, 3. blue, 4. green, 5. cyan, 6. orange).
approximation. Obviously the error decreases if more
values per year are given. Particularly the last two              Overall, we have tackled a set of very diverse
publications are approximated moderately, as the relative         publications for which the fitness functions are visualized
error is above 20%. Yet, the approximation according to           in Figure 4. The first two papers (scenario (a); black and
Gamma distribution works well, as also shown by the               red curve) are the most frequently cited papers of one of
papers’ fitness functions in Figure 4. As mentioned               the lead researchers of the AH community. These two
before, it is important to not consider the two latest years      curves evidence that two very successful papers behave
of the citation history retrieved due to publication and          different in being cited within a community, i.e. that one
indexing delays. These values (2010, 2011) are also not           publication can be fitter than another one and that
visualized in the figure.                                         preferential attachment [15] – a favored paradigm for
                                                                  emergent, networked structures – is not always valid.
A second interesting observation concerns the shape
parameter (k). A lower shape factor is an indicator for a         The fitness of the third paper, a successful follow-up
fitter paper, i.e. a publication cited more often in a shorter    paper of the AH lead researcher (scenario c), is similar to
period of time and reaching the citation peak earlier.            the mostly cited paper of another (well-known)
Comparing the first two papers, both were published by            researcher in this scientific field (scenario e). The less
the same author and on the same topic. Yet, the second            successful papers (scenario d) are problematic as the
one is cited nearly as much as the first one although             approximation of the fitness curve does not work that
being published 5 years later. Most probably, the second          good (high relative error). Most obviously, they are
paper will outpace the first one in the next years, which         characterized by a shape which is growing slower.
can be concluded from the fitness functions shown in              Particularly paper 5 has a shape of over 6, meaning that
Figure 4. As we assume the fitness of a publication to be


                                                                                                                            22
the data could be faulty or that the uptake of this work        lifecycle and a curve following a Gamma distribution. In
was that slow.                                                  particular the results of our research are relevant for those
                                                                activities which aim at creating artifacts that should be
Addressing further issues that might have an influence on
                                                                extensively used by others. By applying our
our fitness estimation method, [8, 9] give a
                                                                approximation method it is possible to compare the
comprehensive overview on problematic issues of
                                                                success of papers with each other and to predict their
citation analysis. Due to a lack of space and time, we
                                                                future performance. However, we see the work tackled in
have not addressed the phenomena of self-citations which
                                                                this paper as a first step only. Based on the fitness
we assume to be necessary to successfully ‘initialize’ the
                                                                estimation method developed, next steps could address
fitness of a paper. Concerning such influential factors, we
                                                                the fitness curves of publications according to different
refer to future work which could aim at differentiating
                                                                scientific communities (local fitness assumption), to the
between self-citations and citations by other researchers
                                                                social networks of paper authors (co-author assumption),
and examining the different fitness functions.
                                                                to self-citations (initialization assumption), to the novelty
Finally it has to be outlined that our fitness estimation       and quality of publications (fit-for-purpose assumption),
method also includes a model for predicting the future          or to other characteristics of such PLE outcomes.
citation frequency. Given the data of the papers we have
                                                                Furthermore, future work could comprise a closer
examined, this prediction worked fine for those citation
                                                                examination of the PLEs which led to high impact
histories going beyond the citation peak. On the other
                                                                papers, i.e. by interviewing the authors of such
hand, this prediction is also based on the assumption that
                                                                publications. Additionally it would be valuable to
in the future no unforeseeable event concerning a
                                                                develop a tool for (semi-)automatically calculating the
publication (e.g. a rediscovery after a couple of decades)
                                                                fitness curve of user-selected papers. From the evaluation
occurs. Here, our approach is restricted to the condition
                                                                perspective it is necessary to examine papers of different
that the citation peak is given and that it is a global
                                                                scientific fields – if sufficient data is available – and to
maximum.
                                                                use data from other systems, i.e. real usage data on
4. CONCLUSIONS, RELATIONS TO                                    publications as captured e.g. by Mendeley (cf. author
OTHER FIELDS, AND FUTURE WORK                                   readership analysis available at http://readermeter.org).
In this paper we have examined a very particular aspect         5. ACKNOWLEDGMENTS
of personal learning environments, namely publications          The research leading to these results has received funding
as outcomes of distributed, collaborative, and                  from the European Community's Seventh Framework
technology-based activities. Precisely we have proposed         Programme (FP7/2007-2013) under grant agreement no
a method for formalizing the fitness of such scientific         231396 (ROLE project).
content artifacts, i.e. the success in being taken up, on the
basis of usage data (the number of citations) retrieved by      6. REFERENCES
a large and up-to-date citation index. Although being           [1] Henri, F., Charlier, B., and Limpens, F. 2008.
restricted by some hard conditions (sufficient data                 Understanding PLE as an Essential Component of
available; citation peak given and global maximum;                  the Learning Process. In Proc. of ED-Media
dependency on a scientific community), the fitness                  (Vienna, Austria, Jun 30-Jul 4, 2008). AACE,
measurement method seems to be valid and reasonable                 Chesapeake, VA, 3766-3770.
due to the following reasons.                                   [2] Van Harmelen, M. 2008. Design trajectories: Four
On the one hand, approximation works fine for well-cited            experiments in PLE implementation. Interactive
papers, as shown in the last section. On the other hand,            Learning Environments 16, 1, 35-46.
citing scientific publications is a natural process for         [3] Futuyma, D.J. 2005. Evolution. Sinauer Associates,
which the waiting times between Poisson distributed                 Sunderland, MA.
events are relevant [16], which can be characterized by a
                                                                [4] Darwin, C. 1859. On the origin of species by means
Gamma distribution. Similar processes can be observed
                                                                    of natural selection, or the preservation of favoured
in other areas, like weather forecast (estimating the
                                                                    races in the struggle for life. William Cloves and
likelihood of monthly rainfalls for draught monitoring
                                                                    Sons, London.
[17]), insurance businesses (effect of risk factors, like
rainfalls, on insurance claims [18]), medical treatment         [5] Wild, F., Mödritscher, F., and Sigurdarson, S.E.
(time to treatment response in arthritis patients [19]), or         2008. Designing for Change: Mash-Up Personal
modeling the distribution of fitness effects in                     Learning Environments. eLearning Papers, 2008(9).
evolutionary biology in general [20, 21, 22].                   [6] Mödritscher, F., and Petrushyna, Z. 2009. Model and
Although the connection between scientific publications             Methodology for PLE-Based Collaboration in
and the PLEs leading to such artifacts is very vague, we            Learning Ecologies. Deliverable D7.1/ID7.2, ROLE
think that the fitness model proposed in this paper is              consortium.
generally relevant for PLE-based activities, as other           [7] Klamma, R., and Petrushyna, Z. 2008. The Troll
aspects of personal learning processes (e.g. tool usage or          Under the Bridge: Data Management for Huge Web
communication behavior) might underlie a similar                    Science Mediabases. In Proc. of the 38.


                                                                                                                     23
    Jahrestagung der Gesellschaft für Informatik e.V.       [16] Weisstein, E.W. 2005. Gamma Distribution.
    (GI), die INFORMATIK 2008 (München, Germany,                MathWorld, Wolfram Research, retrieved from
    Sept 8-13, 2008), Köllen Druck+Verlag GmbH,                 http://mathworld.wolfram.com/GammaDistribution.h
    Bonn, 923-928.                                              tml (2011-02-25).
[8] Smith, L.C. 1981. Citation analysis. Library Trends,    [17] Husak, G.J., Michaelsen, J., and Funk, C. 2007. Use
    30, 83-106.                                                 of the gamma distribution to represent monthly
[9] MacRoberts, M.H., and MacRoberts, B.R. 1996.                rainfall in Africa for drought monitoring
    Problems of citation analysis. Scientometrics, 36,          applications. International Journal of Climatology,
    435-444.                                                    27, 7, 935-944.
[10] Harzing, A.-W.K., and Van der Wal, R. 2008.            [18] Yuen, K.C., Guo, J., and Wu, X. 2002. On a
    Google Scholar as a new source for citation analysis.       correlated aggregate claims model with poisson and
    Ethics in Science and Environmental Politics 8, 1,          erlang risk processes. Insurance: Mathematics and
    62-71.                                                      Economics, 31, 2, 205-214.
[11] Afzal, M.T., Maurer, H., Balke, W., and                [19] Abrahamyan, L., Beyene, J., Feng, J.Y., Chon, Y.,
    Kulathuramaiyer, N. 2010. Rule based Autonomous             Willan, A.R., Schmeling, H., Horneff, G., Keystone,
    Citation Mining with TIERL. Journal of Digital              E.C., and Feldman, B.M. 2010. Response times
    Information Management (JDIM) 8, 3, 196-204.                follow lognormal or gamma distribution in arthritis
                                                                patients. Journal of Clinical Epidemiology, 63, 12,
[12] Egghe, L., and Rousseau, R. 2000. The influence of         1363-1369.
    publication delays on the observed aging distribution
    of scientific literature. Journal of the American       [20] Nielsen, R., and Yang, Z. 2003. Estimating the
    Society for Information Science, 51, 2, 158-165.            distribution of selection coefficients from
                                                                phylogenetic data with applications to mitochondrial
[13] Choi, S.C., and Wette, R. 1969. Maximum                    and viral DNA. Molecular Biology and Evolution,
    Likelihood Estimation of the Parameters of the              20, 8, 1231-1239.
    Gamma Distribution and Their Bias. Technometrics,
    11, 4, 683-690.                                         [21] Loewe, L, and Charlesworth, B. 2006. Inferring the
                                                                distribution of mutational effects on fitness in
[14] Ricci, V. 2005. Fitting distributions with R.              Drosophila. Biology Letters, 2, 3, 426-430.
    Technical report, retrieved from http://cran.r-
    project.org/doc/contrib/Ricci-distributions-en.pdf      [22] Gu, X. 2007. Stabilizing selection of protein function
    (2011-02-25).                                               and distribution of selection coefficient among sites.
                                                                Genetica, 130, 1, 93-97.
[15] Barabási, A.-L., and Albert, R. 1999. Emergence of
    Scaling in Random Networks. Science, 286, 5439,
    509-512.


                                                                                                                   24