Outcome-oriented Fitness Measurement of Personal Learning Environments Felix Mödritscher Institute for Information Systems and New Media, Vienna University of Economics and Business Augasse 2-6, 1090 Vienna, Austria +43-1-31336-5277 felix.moedritscher@wu.ac.at ABSTRACT application of new, PLE-related technologies (like apps, Personal learning environments (PLEs) comprise a new widgets or gadgets) and their underlying infrastructures kind of learning technology which aims at putting (widget containers, personalized websites, mobile phones learners into centre stage, i.e. by empowering them to etc). design and use environments for their learning needs and Considering the spreading of these technologies in purposes. While a lot of research and development is society and the raising profits of leading companies in going on in realizing and providing technical PLE this sector (e.g. Apple or Google), they are highly solutions, less effort is spent in examining the ‘fitness’ of successful. However less attention is paid to their usage PLEs. By fitness we refer to the property of a PLE that it as personal learning environments and their (positive and is successfully used to achieve a goal. In this paper we negative!) effects on lifelong learning. In order to attempt to formalize the PLE fitness by focusing on one formalize and examine the evolvability of PLEs, we build specific aspect, namely on outcomes of PLE-based upon the notion of fitness, a concept given by activities. For this purpose, we analyze a certain kind of evolutionary theory. By comparing the development, PLE outcomes, i.e. publications, by measuring their spreading, and utilization of PLEs – the technical impact and use real-world data harvested in the Web to infrastructures as well as their entities, e.g. tools and their propose a mathematical fitness model. Furthermore, we features – to genetic evolution [3], a learning address factors characterizing the fitness of a publication environment can be understood as a socio-technical as well as preliminaries of our approach. The paper system (organism) with its functionalities (traits). concludes with pointing out related findings from other According to our initial definition, a PLE is a set of tools, fields and possible future work on outcome-oriented PLE services, artifacts, and peer actors, thus the fitness of a fitness measurement. PLE refers to specific situations in which it is used and Categories and Subject Descriptors consequently to defined purposes (fit-for-purpose) as G.3 [Mathematics of Computing]: Probability and Statistics: well as to the scope of a community and a context (local Distribution functions, Time series analysis, H.2.8 fitness). [Information Systems]: Database Applications: scientific Over time, PLEs can evolve, for instance specialize, databases, G.1.2 [Mathematics of Computing]: Approximation: Nonlinear approximation. according to situations in which certain features are used more frequently and others are ignored or even removed General Terms – as learners also demand new features, developers are Algorithms, Measurement, Experimentation. part of this evolutionary process and implement them so that a PLE solution is being used in the future. Such Keywords processes bear a resemblance to the concept of natural Personal Learning Environments, Scientific Publications, selection [4]. In the context of this paper, fitness refers to Citation History Analysis, Fitness Function, Gamma a property describing PLE functionalities. Fitter PLE Distribution. features (genes) become more common, i.e. a certain 1. INTRODUCTION form of a feature (allele; DNA sequence) is used more According to Henri et al. [1], personal learning frequently, spreads faster, or can even substitute other environments (PLEs) refer to “a set of learning tools, forms of the same functionality. services, and artifacts gathered from various contexts to We explain these definitions through two examples for be used by the learners”. Furthermore Van Harmelen [2] the evolution of software artifacts in praxis. A first states that PLEs aim at empowering learners to design example comprises a new way of providing (ICT-based) environments for their activities so that they recommendations. In the last few years many web can connect to learner networks in order to collaborate on applications have included recommendations which shared outcomes and acquire necessary (professional and appear on typing in a term into the search field. rich professional) competences. In the last years a lot of Restricting these recommendations to the user’s context work has been investigated in the development and (e.g. Facebook.com) or auto-completing the query on the 18 basis of terms given by many other users (e.g. Google.com) seem to be two manifestations of this feature which will become more important in the future. So, the generic function “recommendations” has been specialized over time. In a second example a new researcher enters a scientific community on statistical mathematics. In this group of researchers a specific tool, namely the R software, is favored for teaching and research activities. Thus the new member is facing a tool with a high fitness factor within the community and can either work with this tool or try to establish some other software in this community, consequently opposing the R framework. Overall, the idea of our approach is to consider PLEs as the outcomes of (collaborative, ICT-based) learning – Figure 1. Example scenario for PLE-based which is also stated e.g. by Wild et al. [5] – and to collaboration. formalize and examine their evolution over several On a theoretical level and putting the learner (actor) generations. Unfortunately this would require detailed central stage, Klamma and Petrushyna [7] propose a data about PLE-based activities over a long period of model of learning ecologies which is based on the Actor- time – which is not easy to get and which we do not have. Network Theory (ANT) and describes five important Therefore we propose to focus on certain aspects of PLE entities of a PLE: activities, namely on PLE outcomes in the form of  Processes: Activities carried out for educational scientific papers. We use the information on publications reasons, at workplace, or due to personal goals (e.g. a to model and analyze their fitness with respect to their job task in a business process, attending a course for scientific impact. further education, or a spare time activity requiring the The rest of the paper is structured as follows. The next acquisition of new competences) section elaborates our approach towards outcome-  Media: Collection of learning resources required for oriented fitness measurement as well as preliminaries and or created in these activities (e.g. the Wikipedia related work. Then, section 3 describes the stepwise platform, learning objects repository, or simply the development of a fitness function for PLE outcomes and Internet) examines different characteristics of this model. Section 4 summarizes findings as well as similarities to other  Artifacts: Documents and other (digital or real- fields, and discusses the approach towards its relevance world) artifacts collaboratively created and accessed by for the PLE fitness, before an outlook on future work is learners (e.g. Wiki articles or a joint paper) given.  Agents: Actors, no matter if humans or software 2. CONCEPTUAL APPROACH, (e.g. peer learners or functionality provided by PRELIMINARIES, AND RELATED software) WORK  Communities: People sharing the same As mentioned before, we consider scientific papers as typical environment, e.g. in terms of having common interests, PLE outcomes and use bibliographic data to examine and working on the same artifacts, being connected to the formalize their fitness. In a first step we have to clarify same actors (e.g. a group of learners trying to achieve a how publications and PLEs are related. In former course goal or a special interest group for a specific research we have elaborated the notion and the most topic) important concepts of PLE-based learning ecologies [6]. In the scope of this paper, the PLE related to a Figure 1 shows what PLE-based collaboration looks like. publication can be described as follows. A scientific Learners are involved into different activities in which publication is an outcome of a PLE-based activity which they try to achieve personal and group goals (e.g. involves several human agents in different roles (main publishing a paper to a journal). They use various tools to author, co-authors, organizer/editor, reviewers, etc.) and collaborate on shared artifacts. In the context of this using different tools (MS Word, email, paper, publications can be seen as typical outcomes of conference/journal submission system, etc.). The whole such activities, as they are created by one or more publication process consists of various different scientists using different tools – and even single-authored activities, e.g. research, writing, and submission papers normally involve other actors in the background. activities. Normally, a paper also addresses one or a few scientific communities which can be determined by the targeted journal or conference. Realistically the PLE of a publication cannot be fully reconstructed any more, as the tools used and the 19 interaction sequences were not tracked sufficiently. Thus, of our selected publications. On the other hand, the ISI we examine the fitness (success) of papers towards their Web of Knowledge and the ACM Digital Library impact in scientific communities by analyzing the provides bibliographic data on a good quality level but number of citations of different kind of publications over the coverage seems to be poor. Mendeley is not a real time. The analysis of citations and the citation history of citation index, as it rather contains usage data (no. papers is a well-explored field (cf. [8]). Furthermore readers) than citations. Yet, this data is interesting and shortcomings of citation analysis, like biased citing, valuable for our evaluation. In sum, we decided to use secondary sources, variations in citation rates with Google Scholar which contains significantly more and disciplines or nationalities, and many more, are topical data-sets. Moreover, the quality of this data is on elaborated extensively [8, 9]. Yet, we consider these a reasonable level, which is also backed up by other problems of citation analysis (similarly to the learning evaluation studies, e.g. one on citation mining [11]. environment itself) as part of the outcome of PLE-based With respect to [12], citing a research paper follows the activities, being worth an in-depth analysis. Poisson process, a stochastic process in which citations With respect to existing citation indices like CiteseerX occur continuously and independently of each other. (http://citeseerx.ist.psu.edu/), the ISI Web of Knowledge More precisely, the citation curve of a publication can be (http://www.isiwebofknowledge.com/), or the ACM formalized by the convolution of two Poisson Digital Library (http://portal.acm.org/), new tools such as distributions, one describing the initial phase of a paper’s Google Scholar (see http://scholar.google.com/) or uptake and another one representing its continuous aging community approaches like Mendeley (see process. As a simplification and to combine the two http://www.mendeley.com/) provide new opportunities citation curves into one model, we propose to use the for citation analysis on the basis of large and topical data- Gamma distribution to formalize the fitness of a paper sets (cf. upcoming section and [10]). according to its citations. The probability density function of a Gamma distribution is defined as follows In the following we describe the development of an [13]: approach for formalizing the fitness (citation success) of papers and discuss characteristics of this fitness model. 3. MEASURING AND FORMALIZING THE FITNESS OF SCIENTIFIC PAPERS First of all, we had to decide on the data source for the Different to former research which is based upon the bibliographic data required for our approach. After Avramescu function [12] – a specialization of the Erlang inspecting possible platforms (CiteseerX, ISI Web of distribution which itself is a special kind of Gamma Knowledge, ACM Digital Library, Google Scholar, and distribution –, we use the Gamma distribution for Mendeley) we conducted a small evaluation study. formalizing the fitness of a paper, as it allows Therefore, we selected four prominent (i.e. highly cited) approximating the citation curve according to two publications for this brief evaluation, a well-known book parameters, the shape (k) and the scale (θ). Given the on data mining and papers on booming topics in the Web number of citations per year retrieved from Google (Semantic Web and the PageRank algorithm). Scholar, we use the citation history of prominent papers to develop a method for estimating these two parameters. Table 1. Comparison of different citation indices (CiteseerX [CX], ISI Web of Knowledge [WoK], ACM Digital Library Figure 2 displays the citation curves of the four papers [ACM], Google Scholar [GS], and Mendeley [M]) on the analyzed in Table 1. All of these publications are well basis of four highly cited papers and retrieved on February cited and have sufficient data starting in the years 1998, 8, 2011 (*) no. citations given by Scholar vs. sum of yearly 2001, and 2006. The book on data mining (green curve) citations, +) is problematic, as it is the second edition and thus the no. readers) citation history seems to be biased. However, the other Publication AC three papers deal with important innovations in the field CX WoK GS*) M+) on: M of computer science and are considered to be appropriate 10700 for developing a method for measuring the fitness of PLE Data mining n.a. n.a. n.a. 61 /6035 outcomes. 10709 Semantic Web n.a. 1159 n.a. 323 /8312 3670/ PageRank (1) 1301 n.a. n.a. 44 2949 7245/ PageRank (2) 2140 n.a. 1534 573 5917 In Table 1 the comparison of different citation indices is shown. Overall, this statistic confirms the impressions of our inspection. For instance, the data quality of CiteseerX seems to be very poor, as it has no or faulty data on two 20 In a second step, we used (n-2) values of our citation history for estimating the two parameters so that the error rate is minimal. It is recommended to not use the citation data of the last two years (here 2010 and 2011) because of publication and indexing delays, thus the number of citations is incomplete. Given the mode, we have a written a R function which numerically calculates the best values for k and θ by means of minimizing the error rate of the first m values of the citation history (with m being number of values to the mode ) according to the following equation: After calculating the parameters (e.g. k = 5.042 and θ = Figure 2. Citation curves of the four publications mentioned 2.968827 for the selected PageRank paper), the third step in Table 1 (data-sets taken from Google Scholar on comprises evaluation (the relative error for these February 8, 2011; green curve: data mining book, blue parameters is 7.85%) and a visualization of the curve: Semantic Web paper, red and black curve: two papers on PageRank). approximated curve. Figure 3 shows the number of citations gathered from Google Scholar and the For developing our method to approximate the citation approximation according to the Gamma distribution. history according to a Gamma distribution, we used the second paper on PageRank (S. Brin and L. Page, “The anatomy of a large-scale hypertextual Web search engine”, 1998) because sufficient data is provided over a long period of time (see red curve in Figure 2). Basically, our fitness measurement method consists of three steps to approximate a given citation history: (1) determination of the mode, i.e. the value that occurs most frequently in the data-set; (2) parameter estimation of the shape and the scale with respect to minimizing the error rate of the given sample according to the probability density function (pdf) of the Gamma distribution; (3) visualization and evaluation of the approximated fitness curve. The first step, the identification of the mode, is the one which is the trickiest and highly restricts our approach but it is also necessary. As we have only data-sets of the first years after publications appear, we decided to select the mode manually due to two facts. On the one hand, Figure 3. Gamma approximation for PageRank (2) paper distribution fitting algorithms are based on the from Figure 1 (x is the time axis starting with 1 as the preliminary that the values are distributed over time – publication year; red curve describes the Gamma pdf which is not the case for our data. Existing software, like approximated according to the citation history). the open source framework for statistical computing and In principle, we now can formalize the fitness of a PLE graphics (R Project, see http://cran.r-project.org/), outcome by two numbers, the shape and the scale of the provide packages for estimating the parameters of Gamma pdf. If based on sufficient data, this distribution Gamma distributions (cf. [14]), but they do not lead to of a publication’s citation history seems to be reasonable, useful results for our data. On the other hand, we have to as it starts to have impact after being published, reaches a assume that the mode is already included within the data- peak some years in the future and then decreases again. set available, which is also a necessary condition for our The last phase can be argued by effects like more approximation method. successful follow-up publications or aging of published However, having the mode of the distribution gives us knowledge. Overall, this fitness measurement enables the possibility to estimate the two parameters (shape k, comparing the success (impact) of publications to each scale θ) on the basis on the following mathematical other. relationship (setting first derivation of pdf to 0): In the next step we analyzed the fitness of different publications: (a) the most frequently cited papers, i.e. 21 fundamental literature of a selected scientific community, dependent on the community, we restrict the comparison (b) a successful follow-up paper by a lead researcher, (c, of Gamma parameters to this scientific field. Thus, the d) average (less successful) papers of the same author shape calculated for the PageRank paper (Web (single-authored and co-authored papers), and (e) the researcher) cannot be set in direct relation with the shape mostly cited paper of other researchers in a selected field. factors of the AH papers. We used the bibliographic data of the adaptive Next to the speed of a paper’s uptake, success can be also hypermedia (AH) community, as this discipline is very determined by the number of citations in general. Here, young and most of the key publications are captured by the index of Google Scholar. both scaling factors, the Gamma parameter θ (second column of Table 2) as well as the factor to normalize the Table 2. Comparison of selected papers according to our citation history to the pdf of the Gamma distribution fitness estimation method (data retrieved from Google (third column), allow inferences on the quantity of Scholar on February 23, 2011) citations. The first two papers are cited significantly more norm. rel. often than the papers 3 and 6 which in turn are more Publication: k θ factor error successful than the publications 4 and 5. However, both 1. Brusilovsky, “Methods and techniques of adaptive scaling factors dependent on the shape k that is why the 3.105 4.751 2336.03 12.54 fitness function of the first paper has a higher scale and a hypermedia”, 1996 [1373 citations, 16 values] higher normalization factor but a lower peak. 2. Brusilovsky, “Adaptive hypermedia”, 2001 2.993 3.010 2043.25 9.68 [1274 citations, 11 values] 3. Brusilovsky et al., “From adaptive hypermedia to the 3.347 2.983 486.46 15.82 adaptive web”, CACM, 2001 [303 citations, 11 values] 4. De Bra, Brusilovsky, “Adaptive hypermedia: from systems to framework”, 3.724 2.937 275.57 15.60 1999 [159 citations, 13 values] 5. Brusilovsky, “Adaptive educational systems on the 6.648 1.062 141.29 24.92 world-wide-web”, 1998 [174 citations, 14 values] 6. De Bra et al., “AHAM: a Dexter-based reference model for adaptive 3.372 2.530 394.38 22.73 hypermedia”, 1999 [326 citations, 13 values] Table 2 gives an overview of the comparison of papers being relevant for the assumptions (a-e). A first Figure 4. Fitness functions and citation histories (from the observation deals with the relative error of the publication years to 2009) of the papers depicted in Table 2 (colors: 1. black, 2. red, 3. blue, 4. green, 5. cyan, 6. orange). approximation. Obviously the error decreases if more values per year are given. Particularly the last two Overall, we have tackled a set of very diverse publications are approximated moderately, as the relative publications for which the fitness functions are visualized error is above 20%. Yet, the approximation according to in Figure 4. The first two papers (scenario (a); black and Gamma distribution works well, as also shown by the red curve) are the most frequently cited papers of one of papers’ fitness functions in Figure 4. As mentioned the lead researchers of the AH community. These two before, it is important to not consider the two latest years curves evidence that two very successful papers behave of the citation history retrieved due to publication and different in being cited within a community, i.e. that one indexing delays. These values (2010, 2011) are also not publication can be fitter than another one and that visualized in the figure. preferential attachment [15] – a favored paradigm for emergent, networked structures – is not always valid. A second interesting observation concerns the shape parameter (k). A lower shape factor is an indicator for a The fitness of the third paper, a successful follow-up fitter paper, i.e. a publication cited more often in a shorter paper of the AH lead researcher (scenario c), is similar to period of time and reaching the citation peak earlier. the mostly cited paper of another (well-known) Comparing the first two papers, both were published by researcher in this scientific field (scenario e). The less the same author and on the same topic. Yet, the second successful papers (scenario d) are problematic as the one is cited nearly as much as the first one although approximation of the fitness curve does not work that being published 5 years later. Most probably, the second good (high relative error). Most obviously, they are paper will outpace the first one in the next years, which characterized by a shape which is growing slower. can be concluded from the fitness functions shown in Particularly paper 5 has a shape of over 6, meaning that Figure 4. As we assume the fitness of a publication to be 22 the data could be faulty or that the uptake of this work lifecycle and a curve following a Gamma distribution. In was that slow. particular the results of our research are relevant for those activities which aim at creating artifacts that should be Addressing further issues that might have an influence on extensively used by others. By applying our our fitness estimation method, [8, 9] give a approximation method it is possible to compare the comprehensive overview on problematic issues of success of papers with each other and to predict their citation analysis. Due to a lack of space and time, we future performance. However, we see the work tackled in have not addressed the phenomena of self-citations which this paper as a first step only. Based on the fitness we assume to be necessary to successfully ‘initialize’ the estimation method developed, next steps could address fitness of a paper. Concerning such influential factors, we the fitness curves of publications according to different refer to future work which could aim at differentiating scientific communities (local fitness assumption), to the between self-citations and citations by other researchers social networks of paper authors (co-author assumption), and examining the different fitness functions. to self-citations (initialization assumption), to the novelty Finally it has to be outlined that our fitness estimation and quality of publications (fit-for-purpose assumption), method also includes a model for predicting the future or to other characteristics of such PLE outcomes. citation frequency. Given the data of the papers we have Furthermore, future work could comprise a closer examined, this prediction worked fine for those citation examination of the PLEs which led to high impact histories going beyond the citation peak. On the other papers, i.e. by interviewing the authors of such hand, this prediction is also based on the assumption that publications. Additionally it would be valuable to in the future no unforeseeable event concerning a develop a tool for (semi-)automatically calculating the publication (e.g. a rediscovery after a couple of decades) fitness curve of user-selected papers. From the evaluation occurs. Here, our approach is restricted to the condition perspective it is necessary to examine papers of different that the citation peak is given and that it is a global scientific fields – if sufficient data is available – and to maximum. use data from other systems, i.e. real usage data on 4. CONCLUSIONS, RELATIONS TO publications as captured e.g. by Mendeley (cf. author OTHER FIELDS, AND FUTURE WORK readership analysis available at http://readermeter.org). In this paper we have examined a very particular aspect 5. ACKNOWLEDGMENTS of personal learning environments, namely publications The research leading to these results has received funding as outcomes of distributed, collaborative, and from the European Community's Seventh Framework technology-based activities. Precisely we have proposed Programme (FP7/2007-2013) under grant agreement no a method for formalizing the fitness of such scientific 231396 (ROLE project). content artifacts, i.e. the success in being taken up, on the basis of usage data (the number of citations) retrieved by 6. REFERENCES a large and up-to-date citation index. Although being [1] Henri, F., Charlier, B., and Limpens, F. 2008. restricted by some hard conditions (sufficient data Understanding PLE as an Essential Component of available; citation peak given and global maximum; the Learning Process. In Proc. of ED-Media dependency on a scientific community), the fitness (Vienna, Austria, Jun 30-Jul 4, 2008). AACE, measurement method seems to be valid and reasonable Chesapeake, VA, 3766-3770. due to the following reasons. [2] Van Harmelen, M. 2008. Design trajectories: Four On the one hand, approximation works fine for well-cited experiments in PLE implementation. Interactive papers, as shown in the last section. On the other hand, Learning Environments 16, 1, 35-46. citing scientific publications is a natural process for [3] Futuyma, D.J. 2005. Evolution. Sinauer Associates, which the waiting times between Poisson distributed Sunderland, MA. events are relevant [16], which can be characterized by a [4] Darwin, C. 1859. On the origin of species by means Gamma distribution. Similar processes can be observed of natural selection, or the preservation of favoured in other areas, like weather forecast (estimating the races in the struggle for life. William Cloves and likelihood of monthly rainfalls for draught monitoring Sons, London. [17]), insurance businesses (effect of risk factors, like rainfalls, on insurance claims [18]), medical treatment [5] Wild, F., Mödritscher, F., and Sigurdarson, S.E. (time to treatment response in arthritis patients [19]), or 2008. Designing for Change: Mash-Up Personal modeling the distribution of fitness effects in Learning Environments. eLearning Papers, 2008(9). evolutionary biology in general [20, 21, 22]. [6] Mödritscher, F., and Petrushyna, Z. 2009. Model and Although the connection between scientific publications Methodology for PLE-Based Collaboration in and the PLEs leading to such artifacts is very vague, we Learning Ecologies. Deliverable D7.1/ID7.2, ROLE think that the fitness model proposed in this paper is consortium. generally relevant for PLE-based activities, as other [7] Klamma, R., and Petrushyna, Z. 2008. The Troll aspects of personal learning processes (e.g. tool usage or Under the Bridge: Data Management for Huge Web communication behavior) might underlie a similar Science Mediabases. In Proc. of the 38. 23 Jahrestagung der Gesellschaft für Informatik e.V. [16] Weisstein, E.W. 2005. Gamma Distribution. (GI), die INFORMATIK 2008 (München, Germany, MathWorld, Wolfram Research, retrieved from Sept 8-13, 2008), Köllen Druck+Verlag GmbH, http://mathworld.wolfram.com/GammaDistribution.h Bonn, 923-928. tml (2011-02-25). [8] Smith, L.C. 1981. Citation analysis. Library Trends, [17] Husak, G.J., Michaelsen, J., and Funk, C. 2007. Use 30, 83-106. of the gamma distribution to represent monthly [9] MacRoberts, M.H., and MacRoberts, B.R. 1996. rainfall in Africa for drought monitoring Problems of citation analysis. Scientometrics, 36, applications. International Journal of Climatology, 435-444. 27, 7, 935-944. [10] Harzing, A.-W.K., and Van der Wal, R. 2008. [18] Yuen, K.C., Guo, J., and Wu, X. 2002. On a Google Scholar as a new source for citation analysis. correlated aggregate claims model with poisson and Ethics in Science and Environmental Politics 8, 1, erlang risk processes. Insurance: Mathematics and 62-71. Economics, 31, 2, 205-214. [11] Afzal, M.T., Maurer, H., Balke, W., and [19] Abrahamyan, L., Beyene, J., Feng, J.Y., Chon, Y., Kulathuramaiyer, N. 2010. Rule based Autonomous Willan, A.R., Schmeling, H., Horneff, G., Keystone, Citation Mining with TIERL. Journal of Digital E.C., and Feldman, B.M. 2010. Response times Information Management (JDIM) 8, 3, 196-204. follow lognormal or gamma distribution in arthritis patients. Journal of Clinical Epidemiology, 63, 12, [12] Egghe, L., and Rousseau, R. 2000. The influence of 1363-1369. publication delays on the observed aging distribution of scientific literature. Journal of the American [20] Nielsen, R., and Yang, Z. 2003. Estimating the Society for Information Science, 51, 2, 158-165. distribution of selection coefficients from phylogenetic data with applications to mitochondrial [13] Choi, S.C., and Wette, R. 1969. Maximum and viral DNA. Molecular Biology and Evolution, Likelihood Estimation of the Parameters of the 20, 8, 1231-1239. Gamma Distribution and Their Bias. Technometrics, 11, 4, 683-690. [21] Loewe, L, and Charlesworth, B. 2006. Inferring the distribution of mutational effects on fitness in [14] Ricci, V. 2005. Fitting distributions with R. Drosophila. Biology Letters, 2, 3, 426-430. Technical report, retrieved from http://cran.r- project.org/doc/contrib/Ricci-distributions-en.pdf [22] Gu, X. 2007. Stabilizing selection of protein function (2011-02-25). and distribution of selection coefficient among sites. Genetica, 130, 1, 93-97. [15] Barabási, A.-L., and Albert, R. 1999. Emergence of Scaling in Random Networks. Science, 286, 5439, 509-512. 24