A Quantitative Analysis of Biographical Data from Ainm, the Irish-language
Biographical Database
Úna Bhreathnach, Cathal Burke, Jeaic Mag Fhinn, Gearóid Ó Cleircín, Brian Ó
Raghallaigh
Fiontar & Scoil na Gaeilge, Dublin City University
Dublin, Ireland
E-mail: Una.Bhreathnach@dcu.ie, Cathal.Burke@dcu.ie, Jeaic.MagFhinn@dcu.ie, Gearoid.OCleircin@dcu.ie,
Brian.ORaghallaigh@dcu.ie
Abstract
This paper looks at some trends identifiable in the biographical data contained in the Ainm collection of Irish-language related
biographies. The data structure is described and the reasons for its particular structure are outlined. The structured data is then
analysed to identify some notable patterns and significant gaps in the Ainm biographical collection. These features and omissions
are discussed in the context of the creation of both the original print biographical dictionary (the Beathaisnéis series) and the
more recent digital version (www.ainm.ie).
Keywords: Mining biographies for structured information; quantitative analysis; biographical dictionaries; digitizing
biographical data; Irish-language biography; Irish biography
additional information which came to light after
1. The Ainm project: background publication. After the retirement of the Beathaisnéis
authors from the project, and the digitsation of the print
The Ainm (the Irish word for ‘name’) project is an online
material, DCU and Cló Iar-Chonnacht (the publishers of
biographical database focused on people, mainly (although
the series) secured a small amount of funding to update and
not exclusively) Irish, who had a connection to the Irish
expand the biographies. A panel of contributors was
language. It is written in Irish and has been available online
established in 2013 to continue writing biographies, mainly
since 2011.
of recently deceased subjects but also of overlooked lives,
with between 10 and 15 new lives added annually.
The database evolved from, and now significantly expands,
the Beathaisnéis (‘biography’) series of published
In addition to new biographies, additional content has been
biographies (Breathnach & Ní Mhurchú 1986-2007). The
added to the website. Thematic essays are added annually
authors of the Beathaisnéis series, Diarmuid Breathnach
to provide an introduction to different categories of
and Máire Ní Mhurchú, intended to create a dictionary of
biography (e.g. participants in the 1916 rising; traditional
biography, using relevance to the Irish-language world as
singers; folklore collectors). Visualisation features have
the main yardstick for inclusion, and with a strong focus on
been developed too, in particular an interactive map
lives associated with the Gaelic Revival and the period
displaying placenames tagged in the various lives. A
1882-1982, which are covered in five volumes (Breathnach
feature for displaying the social networks of individuals is
& Ní Mhurchú 1986, 1990, 1992, 1994 & 1997). The scope
also in development.
was subsequently expanded, in three further volumes, to the
previous periods, 1782-1881 and 1560-1881 (Breathnach
The result to date is a collection of 1,756 biographies, with
& Ní Mhurchú 1999 & 2001) and to the subsequent period
an average length of 1,223 words and 37 tags or cross-
1983-2002 (Breathnach & Ní Mhurchú 2002), with a
references in each. 1,652 of these biographies are from the
further volume of supplements, amendments and indexes
original series and 104 have been added since 2010. These
(Breathnach & Ní Mhurchú 2007).
biographies overlap with the much larger English-language
Dictionary of Irish Biography1 (c.420 also feature there).
The process by which the Beathaisnéis volumes were
The Ainm database is used widely, with an average of
digitised, tagged and edited by Fiontar & Scoil na Gaeilge,
1,143 searches per day (14/10/2018 - 05/03/19).
DCU, is described in detail elsewhere (Ó Raghallaigh & Ó
Cleircín 2015). Over 600 of the previously published
biographies were updated to reflect corrections and
1
dib.cambridge.org
Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
2. The Ainm database: data structure 3. The Ainm database: metadata
The biographical entries are stored as XML data (using The XML data follows the TEI: Text Encoding Initiative
the SQL Server XML Data Type) in a relational database. guidelines for the most part. In addition to the required
This allows us to store and modify the XML data in an
, , , , and
efficient and transacted way. It also allows us to elements, , ,
conveniently log changes and store versions of the and are included as metadata in the
biographical entries. Each entry in the biographies table element, where known. These metadata are
comprises a unique ID, a text TITLE, and an XML displayed in the biography title and infobox on the public
document. The ID field is a permanent unique identifier website.4 The dates are also used on the timeline5 and
within the database and can be used to access the thematic tag cloud6 tools.
biography over the web in HTML2 or XML3 format. The
HTML is generated from the underlying XML using an In addition to (biographical) entry level metadata contained
XSL transformation. in the header, certain entity types have been tagged inline
in the biography element. These include
placenames (), publications (),
Gaelic League branches (), educational
institutions () and political parties ().
This information is used to create the aforementioned tag
clouds. Placename tags include a reference to the
Placename Database of Ireland7 where the place is in
Ireland and a reference to GeoNames8 where the place is
outside of Ireland. People () are also tagged in
the element. People tags include a cross-reference
where the person is within the database. A
element is included after the element in some of the
newer biographies.
Figure 1: TEI elements used in the biography of Pádraig
Mac Piarais
For the original collection of biographies, all inline tagging
(i.e. markup of named entities in the body text of the
The original biographical entries digitised from
biographical entry) was done automatically using a
Beathaisnéis Volumes 1–9 (Breathnach & Ní Mhurchú,
purpose-built tagger written in Python. The tagger searched
1986–2007) have ID numbers in the range 1–1999. Stubs
for and tagged named entities (i.e. names, placenames,
(short entries) from the original volumes are in the range
publications, institutions, and political parties). The tagger
2000–2999. New biographical entries written and added included a custom NLP function to deal with initial
between 2010 and 2016 are in the range 3000–3999, and mutation (e.g. gCorcaigh) of entities in the Irish-language
entries added since 2017 are numbered 5000+.
text. This function tagged the mutated entity and inserted
the base form (e.g. Corcaigh) as an attribute of the entity.
All inline tagging was subsequently manually checked.
Newer biographies are tagged manually. Older biographies
are gradually being re-checked as they are prepared for use
as “biography of the week” on Twitter, Facebook and in the
project newsletter. User feedback is also considered.
The Beathaisnéis collection (on which the Ainm database
Figure 2: Web view of the biography of Pádraig Mac is primarily based) is the result of the passionate work of
Piarais two committed amateur biographers. One of the challenges
this poses for the creation of standardised metadata is that
the original authors did not complete profile sheets or index
2 6
e.g. www.ainm.ie/Bio.aspx?ID=454 www.ainm.ie/Tags.aspx
3 7
e.g. www.ainm.ie/Bio.aspx?ID=454&xml=true www.logainm.ie
4 8
e.g. www.ainm.ie/Bio.aspx?ID=454 www.geonames.org
5
www.ainm.ie/Timeline.aspx
cards such as those commonly used by other dictionaries of but also the history of the collection itself, as Warren
national biography (Warren, 2018; Reinert et al, 2015) and (2018) suggests in his examination of the Oxford
which can make the creation of entry-level metadata (e.g. Dictionary of National Biography (ODNB); the digitisation
where the person was born) relatively straightforward. of national biographies can be used, he says, to analyse
While Breathnach and Ní Mhurchú did follow a typical dually the history of both the nation and of the dictionary
formula in constructing their biographies, they did not list itself. In doing so, Warren illustrates not only what the
key elements such as profession, religion, gender or place ODNB contains, but also how it came to be:
of birth/death independently of the text. In order to
retrospectively register such information (i.e. the entry- “...investigating the ODNB (1.) in its entirety and
level metadata as opposed to the named-entity recognition (2.) as an historically contingent digital artifact
previously described) it has been necessary to manually offers wider purchase on the historical knowledge
extract the relevant details – a slow process that is still it makes available and the historical knowledge-
ongoing in the case of certain elements. The image below making it constrains...” (2018)
from a typical entry displays the typical categories for
which data has been extracted to date. It includes date of Searches by occupation, birthplace or year tell less about
birth, date of death, place of birth, gender, school, third- their individual importance in the history of the nation than
level education and occupation. Other common categories they do about the imagination and biases of the various
such as religion and place of death have yet to be extracted. contributors. Warren notes that mothers of women in
ODNB are quite often queens, and mothers in general tend
to be actresses, teachers and noblewomen, while fathers are
frequently landowners, army officers, clergymen or
merchants. National biographies do not generally attempt
to capture the typical member of the nation, but rather the
atypical, the exceptional names deemed important by the
biography’s contributors. It might seem odd that “naval
officer” is the third most frequent profession in the ODNB
and that Britain and the Defeat of Napoleon (1996) is the
most referenced monograph in the entirety of the ODNB,
but the context of the ODNB’s construction casts light on
these seeming oddities: a prolific naval historian, Sir John
Laughton (1830-1915), was responsible for 1,000
Figure 3: Typical categories used for data extraction biographies of naval figures, roughly 1 out of every 38
(example: Tomás Ó Flannghaile) Dictionary of National Biography entries, all of which, it
was decided, would be added to the subsequent ODNB.
This required additional research which often referenced
Britain and the Defeat of Napoleon (1996).
4. Quantitative analysis
Likewise, most common years of death in ODNB not only
4.1 Background reflect periods of illness or bloodshed, but also the inherent
The digitisation of biographical collections offers the biases contributors brought in with their selection of
opportunity to examine collections at a scale not previously subjects:
possible when only utilizing biographical text. With the “The local peaks in 1883 and 1908... once again
remind us to attend to the data infrastructure.
creation and linking of standardised datasets from
Rather than marking some hitherto unknown
unstructured text, the overall contents of the collection can plague afflicting the Victorian aristocracy, 1883
be revealed and trends can be analysed. The Finnish marks the point at which contemporaneous deaths
BiographySampo9 (Tamper et al, 2018), for example, ceased to be meaningful to Stephen, his deputy
illustrates the potential for such examination, interrogating Sidney Lee, and their collaborators.” (Warren,
the biographies as a collection of linked data, as does the 2018)
Netherlands’ BiographyNet10 (Fokkens et al, 2017). Not
only do such datasets illustrate the overall makeup of a
What biases, designed or unintended, can therefore be
national biography, and therein the history of the nation,
found within Ainm? Inspection of the most common years
9 10
https://seco.cs.aalto.fi/projects/biografiasampo/en/ http://www.biographynet.nl/
of birth or death reveal the clearest influence the selection writing of these biographies inherently asserts the
criteria and original aim of the biographical dictionary had importance of the Irish language itself in both the basis and
on the dictionary itself. The original impetus for the imagination of the nation.
Beathaisnéis was to produce a biographical dictionary
based on 100 years of the Gaelic Revival, covering those 4.2 Timespan
who had died between 1882 and 1982. The scope of the
Magnus Ó Domhnaill (c.1490-1563) is the earliest-born in
project gradually changed as the authors decided to include
lives from both before and after that arbitrary period. the collection, with 2017 the most recent year of death.
Nonetheless, the fact that the first five volumes focused Spanning seven different centuries, there are 306 different
years of birth. An interesting demographic is revealed when
exclusively on that 100 year period and that only two of the
nine volumes cover the period from 1560 to 1881 means we analyse the years of birth and death on a broader range.
The most significant is the fact that 815 people were born
that the collection is inevitably biased towards the period
from the mid 19th century onwards. This is shown clearly in the 19th century, 46% of the lives. If we add in those
who were born in the 20th century, 514 people, we reach
in figures below (Figure 4).
1329 lives, 75% of the total collection. Furthermore, 1064
The image of the nation captured in the pages of the ODNB people died in the 20th century, 62% of the total. Therefore,
is less a reflection of the nation’s history than of the making having been born in the 19th century, and having died in
of the dictionary itself (Warren, 2018). Between 1450 and the 20th century, the majority of lives lived through the
2000, France, the Netherlands, and the United States of revival period of Irish, something which comes in line with
America all apparently supersede England, Scotland, the understanding that the Beathaisnéis project initially
Ireland, and Wales in importance in the ODNB, Warren centred around those most active in reviving the language
claims, because of the presumed Englishness of each during the late 19th and early 20th century. There are
biographical subject. Relevant countries besides England however 110 people with no year of birth or year of death
were mentioned specifically, while any reference to the stored as metadata. (In most cases, these lives were in the
dictionary’s own nation was left assumed, and therefore left form of short ‘stub’ articles by the Beathaisnéis authors,
out. Likewise, there is a presumed continuity among the where birth and death dates were not included in the title
Ainm biographies: each person played a particular role in and therefore not automatically extracted. This is an area
the Irish-language world. Their relevance to this world is for future improvement. Having identified this gap, we will
fundamental to their inclusion in this collection and begin manually extracting available missing data in order
remains the principal criterion for evaluating suitability. to store it accordingly.)
The first volume of the Beathaisnéis series outlined criteria
for inclusion:
“...Irish speakers who did something remarkable
or who achieved a level of excellence in their
lives. Undoubtedly there are also Irish speakers
who wouldn’t earn a place in the national
pantheon but who are still of importance or who
are well-known in the context of the Revival
period. Both types are included in this volume and
will be included in future collections.”11
(Breathnach & Ní Mhurchú, 1986, 11)
Figure 4: The predominance of the 19th Century (year of
Although it might not seem noteworthy to specify the first birth) and the 20th Century (year of death) in the above
official language of the nation in a national biography, the graph helps to further illustrate the original aims of the
case of Irish is somewhat exceptional, in that it exists Beathaisnéis project.
simultaneously as official and minoritised, essential to the
establishment and imagination of the nation while only 4.3 Gender
being spoken by a minority of the same nation. These
Men account for almost 90% (1,580) of the biographies.
biographies, therefore, aim specifically to capture, and
There are only 176 biographical accounts of women. These
write into being, the Irish-language nation not previously
figures highlight a major gender imbalance in the
recorded in biography. In highlighting the important lives
collection, and surely represent the greatest area for future
of the nation which are relevant to the Irish language, the
research opportunities, but they are not out of sync with
11
Authors’ translation.
other international biographical databases (Farr, 2012). Of Figure 5: The top 10 places of birth recorded in the Ainm
the women included in the collection 86% were born from database (where metadata is available)
the year 1847 onwards and 76% died in the twentieth
century. Of the biographies written since 2013, 26% are of While each province is widely represented in the collection,
women, showing a significantly increased representation. Munster is the highest represented province with 534 lives
(44% of those born in Ireland), almost double that of
Leinster (280), followed by Connacht (210) and Ulster
4.4 Birthplace: country and county
(193). The noticeable difference is something which has
A connection to the Irish language is the primary condition been previously alluded to by the original authors of the
for inclusion in the collection, yet 20 different countries are biographies. Although they made an effort not to neglect
represented in the database. Ireland (including Northern other areas (referring to Connacht and northern Leinster
Ireland) is the top represented country with 1,217 people. particularly), it is clear that there were more people of
England is next on the list with 63. Germany, Scotland and interest to them in Munster, due mainly to the historic
the United States are next, each with 15. The other strength of Irish in the province, particularly around the
countries represented in the database are India, Norway, time of the revival period: ‘…there is a sort of nucleas or
Switzerland, Sweden, Italy, Wales, France, the kernel of literacy, as you’d say, in Munster, and especially
Netherlands, Australia, Denmark, Malta, Belgium, China, in Cork, and maybe part of Kerry as well. But, I saw figures
the Czech Republic and Japan. The total number of people from the time of the revival… seventy or eighty percent of
born outside of Ireland is 140, or 8% of the collection. the people reading the language were in Cork.’12 Diarmuid
Breathnach also states his belief that ‘they had very good
399 people have no recorded place of birth stored as
Irish, especially those from Cork and a lot of Munster
metadata. There are a number of very short biographies people particularly.’13 Of the biographies published since
(134) which lack key biographical information and require
2013, 17% are from Munster. This reduced proportion may
further research. Filling this gap represents an area of future
be attributed to the fact that the Irish-speaking community
improvement for the project. It was not possible for the is no longer Munster-dominated, or to the bias of the
original authors to find records of a place of birth for some
Beathaisnéis authors.
103 lives from the 16th, 17th and 18th century.
Each county in Ireland is represented in the collection
(Figure 6), with Cork (the largest county by size) being the
highest represented county with 197 people, or 16% of
those born in Ireland, and Fermanagh and Leitrim (both
small counties) being the lowest represented counties with
3 lives each. The top six represented counties are Cork,
Dublin, Galway, Kerry, Donegal and Waterford; all except
Dublin contain an Irish language speaking area, or
‘Gaeltacht’. These counties represent 42% of the
collection. The number of people born in England, 63, is
higher than any of the other 26 counties.
Figure 6: Province of birth for those born in Ireland
(where metadata is available)
4.5 Education and profession
A university education is recorded for 767 subjects; another
989 do not have metadata stored regarding university
education and some of these were also university educated.
Of those with available metadata, 40% of women (71)
attained some form of university education, in comparison
12 13
Translation of extract from unpublished interview with Translation of extract from interview with Diarmuid
Diarmuid Breathnach and Máire Ní Mhurchú, 2010. Breathnach and Máire Ní Mhurchú, 2010,
www.ainm.ie/Info.aspx?Topic=resources.en
to 44% of men (696). University College Dublin (173) was lawyers, doctors, astronomers, actors, journalists, artists,
the most commonly attended university of the database, engineers, miners, broadcasters, soldiers, and publishers.
followed by Trinity College Dublin (111), St. Patrick’s
College, Drumcondra, Dublin (71), National University of
Ireland, Galway (69), St. Patrick’s College, Maynooth (61),
and National University of Ireland, Cork (56). 45 attended
either Oxford University, Cambridge or Harvard, but only
6 of these were born in Ireland. There are accounts of
people attending university all across Europe, most notably
universities in England, Germany, France, Italy, Spain and
Belgium. Of these, the Irish Colleges of Rome and Paris,
and Leuven University, Belgium, appear more frequently
than others. The preponderance of these religious
institutions can be attributed to the large number of
clergymen who travelled abroad for education during the
period from the 16th to 18th centuries when this was Figure 7: Professional demographics for subjects born pre
prohibited to Catholics in Ireland. and post the Great Famine
(where metadata is available)
Of the 1,756 lives, 1,690 have at least one occupation
recorded, with ‘teacher’ being the most common Priests, poets, and writers dominate the professions early in
profession, among both men and women, with a the 16th and 17th centuries. The 19th century (see Figure
representation of 20% of men and 24% of women (21% 7) sees a decline in poets, 70% of whom were born before
total). This makes sense in the context of the central role the start of the Great Famine (1845); this can be attributed
the Irish language played, and continues to play, in the to the decline of the bardic poet tradition in Irish. The
education system, however the original aims of the numbers of teachers, civil servants, politicians and
collection certainly influence this propensity towards translators begin to rise around the same time. The end of
education, given the necessarily central role of teachers in the 19th century and beginning of the 20th century also sees
the revival of any language. Many of those involved with a rise in folklore, music, and song collectors, no doubt due
the Gaelic Revival spent time teaching Irish to others. to the desire to recuperate all that was lost in the previous
century of famine, emigration and political unrest.
There is a very high proportion of clergymen. There are 239
Catholic priests (bishops, archbishops, Christian brothers, 5. Conclusion
Franciscans, Jesuits) and 42 Protestant ministers, which
represents around 18% of men documented. In comparison, The Ainm example highlights some issues which confront
there are only two nuns, Mary Bonaventure Browne and digitisers of biographical dictionaries: omissions or
Máire Treasa Ó Murchú, recorded in the collection. unstructured data in original material, and text which is
not easily tagged. These issues are still being addressed by
Most of those teachers and clergymen had a second the editorial team.
occupation for which they were more recognised;
clergymen were often professors. For both men and The preponderance of 19th and 20th century lives in Ainm
women, writers, scholars and poets complete the top five is a reflection of the original editorial aims, rather than of
professions: being a published writer was one of the the most important era for the Irish language, which had
suggested criteria for inclusion in the collection.14 This begun to decline as a literary and administrative language
preponderance of writers, and initial suggestion for their long before then. Quantitative analysis can be used to
inclusion, also corresponds with the original focus on the confirm the authors’ acknowledged bias towards certain
Gaelic Revival, in which the construction of a modern, regions (Munster) and professions (writers), as well as the
written literature in Irish played an important role. Other usual gender disparity. As Warren (2018) found for the
occupations to feature highly on the list include civil ODNB, so too for Ainm: it tells the history of both the
servants, musicians, singers and folklore collectors, nation and of itself.
politicians, lecturers, translators and editors; there are also
14
Unpublished interview with Diarmuid Breathnach and
Máire Ní Mhurchú, 2010.
6. Acknowledgements
The Ainm project is a partnership between Cló Iar-
Chonnacht, an Irish-language specialist publisher that
holds the copyright to the material, and the Gaois research
group in Fiontar & Scoil na Gaeilge, Dublin City
University, who developed and maintain the database.
Funding for the project is provided by the Irish
Government.
7. References
Breathnach, D., Ní Mhurchú, M. (1986-2007).
Beathaisnéis (9 volumes). Dublin: An Clóchomhar.
Farr, M. (2012). Review of Online Dictionaries of
National Biography, (review no. 1259),
https://reviews.history.ac.uk/review/1259, (accessed
09.05.2019).
Fokkens. A., ter Braake, S., Ockeloen, N., Vossen, P.,
Legêne, S., Schreiber, G., de Boer, V. (2017)
BiographyNet: Extracting Relations Between People
and Events. In: Bernád, Á. Z., Gruber, C., & Kaiser, M.
eds., Europa baut auf Biographien: Aspekte, Bausteine,
Normen und Standards für eine europäische
Biographik. Wien: New Academic Press. pp. 193--224.
Muir, R. (1996). Britain and the Defeat of Napoleon,
1807-1815. New Haven: Yale University Press.
Ó Raghallaigh, B., Ó Cleircín, G. (2015). Ainm.ie:
Breathing new life into a canonical collection of Irish-
language biographies. In Biographical Data in a Digital
World (BD). Amsterdam: CEUR-WS.org, pp. 20--23,
http://ceur-ws.org/Vol-1399/paper4.pdf, (accessed
15.05.2019).
Reinert, M., Schrott, M. and Ebneth, B. (2015). From
Biographies to Data Curation - The Making of
www.deutsche-biographie.de. In Biographical Data in
a Digital World (BD). Amsterdam: CEUR-WS.org, pp.
13--19, http://ceur-ws.org/Vol-1399/paper3.pdf,
(accessed 15.05.2019).
Tamper M., Leskinen P., Apajalahti K., Hyvönen E.
(2018) Using Biographical Texts as Linked Data for
Prosopographical Research and Applications. In:
Ioannides M. et al. (eds) Digital Heritage. Progress in
Cultural Heritage: Documentation, Preservation, and
Protection. EuroMed 2018. Lecture Notes in Computer
Science, vol 11196. Springer, Cham.
Warren, C. (2018) Historiography's Two Voices: Data
Infrastructure and History at Scale in the Oxford
Dictionary of National Biography (ODNB). Journal of
Cultural Analytics. 22.11.2018, 10.31235/osf.io/rbkdh,
(accessed 08.05.2019).