<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The discourse of the French method: making old knowledge on market gardening accessible to machines and humans.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>David Colliaux</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Remi van Trijp</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Sony Computer Science Laboratories - Paris</institution>
          ,
          <addr-line>6 Rue Amyot, 75005 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>1117</fpage>
      <lpage>1127</lpage>
      <abstract>
        <p>A vast amount of our cultural heritage is at risk of getting lost because it resides in old books that are difÏcult to access. It is therefore important to make this information available to human readers but also to machine analysis, so that new representations and insights based on this knowledge can be constructed. In our case study, we use a host of digital tools to extract and analyze a corpus of 19th century French texts about the practices of market gardening in Paris, and to apply a variety of possible visualizations in an integrated interface. Our work includes a Named Entity and Linking procedure for creating maps of the locations mentioned in these texts as well as the social networks of people cited in the books. We also consider how the analysis of verbs can approximate and represent the knowhow of market gardening: we analyze the statistics of those verbs compared to their usage in a general corpus for French, and map the verbs using word embeddings. Finally, we also consider a semantic frame analysis to extract causal relations from texts to evaluate how well these relations support the biological knowledge embedded in those texts (such as how too much exposure to the sun may afect the quality of the garden's produce). Altogether, we show how the visualizations based on Natural Language Processing and Textual Statistics could support a convivial navigation through the corpus.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;digital humanities</kwd>
        <kwd>grounded language</kwd>
        <kwd>corpus linguistics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Digital libraries gather large corpora of texts which are beyond human possibilities of reading.
One of the tasks of digital humanities [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] is thus to organize and analyze those texts so that
they are easy to navigate. For instance, through distant reading [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], we may construct curves,
graphs and maps that make this large quantity of information graspable for the human mind.
Moreover, it is necessary that the information is accessible not only to humans but also to
machines, so that further processing may be applied to those texts.
      </p>
      <p>
        A large collection of works dedicated their eforts in this direction, applied to literary texts
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and the press [6], showing the potential of text mining and natural language processing
for such corpora. However, less attention has been paid to manuals, even though such texts
are essential as they encapsulate the knowledge of a particular era about a certain topic. In our
case, we focused on 19th century manuals about market gardening. Those manuals are both a
record of the practices of the time and the beginning of the crystallization of this knowledge
into a science, namely agronomy.
      </p>
      <p>19th century texts are particularly interesting because shortly after that period, from the
second part of the 20th centuray onwards, agriculture went through radical changes with the
green revolution and the introduction of chemicals to control the growth and the environment
of plants. These changes, which were driven by the agronomical institutions, were so sweeping
that we can reasonably ask whether some part of the old knowledge was lost. To answer this
question, it is necessary to mine the older texts; and their analysis will also help visualize some
interesting aspects of the history of agriculture.</p>
      <p>We present here how we built the corpus, the preprocessing of the data and some analysis
we did on the texts. First, we performed Named Entity Recognition and Linking to gather
information on the places and people cited in those books. Then, we analyzed the verbs appearing
in the corpus through semantic embeddings. And finally, we collected sentences expressing
causal relations as those are most susceptible of containing agronomical knowledge. For each
of these analyses, we provide visualization which can help navigate the corpus in an interactive
manner.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The good Old Manuals corpus</title>
      <p>
        The gardening manuals of the 19th century are a memory of the development of very efÏcient
methods for growing vegetables in an urban environment (as many of these books are focused
on the practices in the Paris area). These methods of cultivating very densely mixtures of crops
on small plots of land have inspired a movement in California and more recently in Europe
commonly referred to as the Biointensive French Method [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], or French Method [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for short.
The French method is related to more recent practices like agroecology [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] or permaculture
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], although the French Method insists on how to force the culture of vegetables out of season
to be able to sell products at higher price early in the season or late in the season. One book
in particular, Manuel pratique de la culture maraîchère de Paris by Moreau and Daverne, was
particularly influential according to the actors of this revival [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], but there is a rich collection
of literature on the topics in the 19th century, among which we picked references to include
in our corpus. We describe below how the manuals were selected to compose the Good Old
Manuals corpus (GOM).
      </p>
      <sec id="sec-2-1">
        <title>2.1. Selection of the books</title>
        <p>The first selection of books was collated by looking at the recommended readings accessible
on an online platform about agroecological practices. The GOM1 corpus is thus composed of
seven books listed in the table below. Additionally, we included 14 more books in the full GOM
corpus after discussions with specialists of market gardening. All books are related to market
gardening and were published between 1802 and 1912. For the following textual analysis, we
only consider the GOM1 section of the corpus. The list of books included in the full GOM
corpus is available on the companion website 1.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Text extraction and preprocessing</title>
        <p>
          The first step in our analysis is to extract the layout of each page, identifying regions of the page
occupied by text paragraphs, title, figures or tables using an image segmentation algorithm
based on Faster RCNN trained on a large collection of publications [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. In this process, we
could extract 1269 figures and 120 tables. The regions of the images classified as text were then
fed to the Tesseract library [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] for optical character recognition (OCR).
        </p>
        <p>
          As expected, the resulting text still includes many mistakes, so a first preprocessing was done
to substitute characters unlikely to appear in the text by their most likely replacement (for
example ä-&gt;à). Next, to correct spelling mistakes from the OCR, we filtered out-of-vocabulary
words (using the reference lexicon MORPHALOU3; [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]), for example ”avans” instead of
”avons”. A Bayesian model [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] combining the estimation of the most likely mistakes (using
the confusion matrix of the characters 2) and the closest neighbors using the edit distance with
a weight diferent for words at 1 edit distance and 2 edit distance. For a string s, we select the
candidate valid word w maximizing P(w).P(s|w). Where P(w) is the frequency of occurence in
a base corpus (FRANTEXT [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] in our case) and P(s|w) is the probability of subsitutions leading
from s to w as given by the confusion matrix. From this process, we managed to reduce the
number of out-of-vocabulary words from 80000 to 8000.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Named entity recognition and linking</title>
      <p>It is important to identify the places and people cited in the GOM corpus so that the texts
can be properly situated in their appropriate geography and history. For this, we used the
out-of-vocabulary words, and selected the ones written starting with a capital letter. We then
matched this list to a dictionary of geographical locations including their localization as GPS
coordinates. In the remaining words, we checked manually, through web search, in the most
commonly cited if those correspond to personalities.</p>
      <p>Additionally, for places, there is a common ambiguity in our corpus on whether the name of a
location is used to refer to the location or to a variety of plant originating from this location. To
disambiguate this, we manually annotated all the mentions of names of locations as referring
to the location or to a variety of plant originating from this location.</p>
      <p>Based on this recognition of places and people, we were able to visualize both aspects. First,
in a graph on Fig.2, we represented the authors and the most cited people (more than 2 times).
We drew an edge between an author and a cited person if this person was cited by the author.
We see that some authors cite generously, while some others only mention a few people. For
example, in the Moreau &amp; Daverne, only Héricart de Thury and Mr Gontier are cited. The
book they wrote was a response to a call emitted by the Royal Society of Horticulture, whose
director was Héricart de Thury; and Mr Gontier was a market gardener in the region of Nantes
and who was among the first to experiment with an innovative technology of the time, the
thermosiphon. For places, on Fig.3, we placed circles on a map of France with the radius
denoting the frequency of occurrence of the name of place in the GOM1 corpus. We notice that there
are many mentions of places in the Paris region, which is expected since a lot of the practices
we are interested in are originating from the Paris region.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Mapping the key verbs in the GOM corpus</title>
      <p>
        It is interesting to focus on the verbs mentioned in the GOM corpus as they reflect the actions
that are important to a market gardener on their farm. We are particularly interested in the
verbs that are specific to market gardening, which can be considered as a keyword identification
problem. For this, we first lemmatize and POS tag the texts using spacy, a widely used tool for
2We used the confusion matrix available at https://github.com/shaneweisz/OCR-Character-Confusion/blob/mast
er/confusion_matrix/confusion_matrix_base.pkl
people are listed in the central columns. Names in purple refer to people mostly on the knowledge side
(professors of agronomy or botany for example) and names in yellow refer to people involved on the
practical side (market gardeners, seed sellers,...).
(


 )

various NLP tasks 3. Then, similarly to the keyness commonly used in corpus linguistics [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ],
we measure for each verb the logarithm of the ratio between   the frequency of occurrences
in the GOM1 corpus and   the frequency of occurrences of the verb in a reference corpus,
FRANTEXT [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which gathers 31 M words from periodicals the 19th and 20th century :  =
      </p>
      <p>The word cloud in Fig.4 shows the verbs with a size proportional to this index in yellow
and the verbs not appearing in FRANTEXT in red with a size proportional to the log of the
frequency of occurrences in GOM1.</p>
      <p>
        In the previous representation the location of words has no interpretation and we also want
to represent the words in a space where two words located close together would have similar
meaning (in the distributional sense). That representation can be useful, for example, to show
groups of words clustered together having a similar meaning. We represented each verb using
its embedding in a word2vec model trained on a large French corpus [1] and we visualize the
map of verbs after reducing the dimension of the embedding to 2 dimensions using UMAP [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
in Fig.5. We can for example identify a cluster of verbs describing actions of the farmer in
the field (sarcler-palisser-semer) or verbs related to biological processes of the crops
(pommertacheter-fleurir) being grouped together. Such a map is useful to navigate the content of the
manuals and the embeddings may be useful to classify parts of the text.
      </p>
      <p>The GOM corpus gathers an rich mixture of practical advice and practical knowledge. It is
interesting to study whether the discourse in those books reflects this dichotomy between
practices and knowledge. A key feature of the transition of discourse from practice to knowledge
is nominalization, a linguistic process where nouns are derived from verbs [11]. Thus in the
particular example of the verb arroser (“to water”), we plot the usage statistics in each of the 7
books of the GOM1 corpus. We see, in Fig.6 top panel, that some authors favor much more the
use of the verb than the noun, denoting a more practical and less abstract discourse. Also, it is
interesting to note that in the case of the verb arroser, there were actually two forms for the
corresponding noun: arrosage and arrosement (both meaning “the watering (of crops)”). By
plotting the frequencies of occurrence of these two terms in large corpora (Gallica and Google
books), it shows that the 19th century is precisely the time during which those 2 terms
coexisted, arrosement being used more frequently before; and arrosage becoming dominant after
the 19th century. Some references (ATILF) mention a small diference in the meaning of those
2 terms, arrosement being more related to a passive manner for plants to receive water and
arrosage referring to a more active process from a human to provide the water.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Extracting causality frames</title>
      <p>We were also interested in capturing the parts of the discourse reflecting causal relations
because in the sentences expressing causality, we may find elements of biological knowledge. For
example, let us consider the following sentence from Moreau et Daverne:</p>
      <p>”Autre observation : la pratique nous a appris que, pendant l’été, si nous arrosons nos
romaines durant le grand soleil avec l’eau froide de nos puits, quand elles sont près de se coifer
ou déjà coifées, cela détermine dans leur intérieur des taches de pourriture; nous disons alors
que la romaine est mouchetée : dans cet état, elle n’est plus bonne pour la vente.”  “Another
observation: practice has taught us that, during the summer, if we water our romaine plants in
the hot sun with cold water from our wells, when they are about to be capped or have already
been capped, this causes spots of rot inside them; we then say that the romaine is speckled: in
this state, it is no longer fit for sale.”</p>
      <p>Here, the authors draw a causal relation between on the one hand the watering of the crops
with cold water when it’s hot at a specific growth stage of the crops; and on the other hand the
rotting of their leaves. Even though knowledge was too scarce at the time to fully explain this
phenomenon, namely that these conditions are favoring the growth of fungi, it is clearly some
kind of knowledge about biology that is encapsulated in the text.</p>
      <p>
        To detect such causal relationships in a systematic matter, we are currently performing a
Frame-Semantic analysis [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] of the corpus. A Semantic Frame is a structured piece of
knowledge that can be considered as a template of a scene with several open slots (called Frame
Elements) that need to be filled in. One example is the Causality Frame, which comes with
‘core’ Frame Elements such as Cause and Efect, and ‘non-core’ elements that further qualify
the relation. The linguistic sister theory of Frame Semantics is called Construction Grammar
[9], which explores how semantic frames get expressed in language through associations of
form and meaning called constructions. There are typically two types of constructions
involved. The first kind are frame-evoking constructions (usually lexical items or multiword
expressions), which activate a semantic frame. In French, numerous words and multiword
expressions evoke the Causality frame, such as à cause de “because of”, parce que “because”,
occasionner “to bring about”, suite à “due to”, and so on. The second type are grammatical
constructions (typically argument structure constructions; [10]), which identify which phrases
of a sentence should be mapped onto which Frame Elements.
      </p>
      <p>
        Our Semantic Frame Extractor has been implemented in Fluid Construction Grammar (FCG;
[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]), an open-source computational grammar formalism for engineering Construction
Grammars, following the methodology described by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], who developed a Causality Frame Extractor
for English. Our approach integrates several knowledge sources: • Input sentences are
preprocessed using both a dependency parser and a constituency parser (such as the Berkeley Neural
Parser; [13]). These diferent structures are integrated in a single syntactic representation
of a sentence using feature structures. During the training phrase, annotations of semantic
frames are mapped onto the syntactic analysis to extract recurrent patterns of form-meaning
associations (constructions). Patterns that are not frequent enough are pruned because they
typically result from annotation errors. The semantic annotations were taken from the French
FrameNet, developed within the ASFALDA project [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The French FrameNet project has
explicitly focused on Causality as one of its main domains, and includes 11 distinct Causality
frames and 217 distinct frame-evoking elements. Fig.7 illustrates the kind of information that
can be extracted using this method. On the left is an input sentence, and on the right is a
Causality frame that was detected. As can be seen, the verb form détermine (here: “causes”) is
the frame-evoking element (FEE). It has designated its subject (cela “that”) as the Cause, and
its direct object (tâches de pourriture ”spots of rot”) as the Efect.
      </p>
      <p>In its current form, a Causal Frame extractor is already useful because it can search through
a text for instances of causal language, and then present the results to the human reader. We
are currently evaluating how well a frame extractor trained on contemporary French data can
be applied to the Good Old Manual corpus. For this, we are annotating a test set of causal
expressions that can be found in the corpus. Moreover, as can be seen in Figure 7, the Frame
Extractor currently identifies Frame Elements through syntactic relations, so the syntactic
subject cela was assigned the role of Cause rather than the semantic subject (printed in italics),
which is what really matters for extracting knowledge. Future work will therefore have to
include anaphor resolution and tracking entities across longer spans in the discourse.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Old texts are often treasure troves of past knowledge that has become almost inaccessible or
even forgotten as societies evolve. Especially “good old” manuals, which have so far been
neglected, ofer a great potential source of information about the knowledge and practices of a
given time and place. In this paper, we have illustrated how a suite of techniques from Digital
Humanities, natural language processing, statistical analysis and data visualization, can be
exploited to make such texts not only accessible, but also more meaningful to human readers.</p>
      <p>More specifically, we have introduced the Good Old Manual corpus of 19th century texts
about French market gardening, particularly in the Paris region. These techniques have
recently gained a renewed interest because they ofer insights into increased efÏciency for
farming on small plots of lands, known as the French Method. We have demonstrated how the
most prominent actors at the time can be situated in a social and geographic network through
named entity linking; how activities that are relevant and meaningful to specific topics such
as market gardening can be visualized through word clouds and word embedding spaces, and
how more fine-grained knowledge could potentially be mined through semantic parsing.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>H.</given-names>
            <surname>Abdine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xypolopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Eddine</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Vazirgiannis</surname>
          </string-name>
          . “
          <article-title>Evaluation of word embeddings from large-scale French web content”</article-title>
          .
          <source>In: arXiv preprint arXiv:2105</source>
          .
          <year>01990</year>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bernard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lecomte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dendien</surname>
          </string-name>
          , and J.
          <string-name>
            <surname>-M. Pierrel</surname>
          </string-name>
          . “
          <article-title>Computerized linguistic resources of the research laboratory ATILF for lexical and textual analysis: Frantext, TLFi, and the software Stella</article-title>
          .” In: Lrec. Citeseer.
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Beuls</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Van Eecke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and V. S.</given-names>
            <surname>Cangalovic</surname>
          </string-name>
          . “
          <article-title>A computational construction grammar approach to semantic frame extraction”</article-title>
          .
          <source>In: Linguistics Vanguard 7.1</source>
          (
          <issue>2021</issue>
          ), p.
          <fpage>20180015</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>C. de</surname>
          </string-name>
          Carné-Carnavalet.
          <article-title>Le maraı̂chage sur petite surface: La French Method: une agriculture urbaine ou périurbaine</article-title>
          . Editions de Terran,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5] [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Djemaa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Candito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Muller</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Vieu</surname>
          </string-name>
          . “
          <article-title>Corpus annotation within the French FrameNet: a domain-by-domain methodology”</article-title>
          .
          <source>In: Tenth international conference on language resources and evaluation (LREC</source>
          <year>2016</year>
          ).
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Ehrmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Düring</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Neudecker</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Doucet</surname>
          </string-name>
          . “
          <article-title>Computational Approaches to Digitised Historical Newspapers (Dagstuhl Seminar 22292)”</article-title>
          . In: (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Ferguson</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Lovell</surname>
          </string-name>
          . “
          <article-title>Permaculture for agroecology: design, movement, practice, and worldview. A review”</article-title>
          .
          <source>In: Agronomy for sustainable development 34</source>
          (
          <year>2014</year>
          ), pp.
          <fpage>251</fpage>
          -
          <lpage>274</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Fillmore</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Baker</surname>
          </string-name>
          . “
          <article-title>A frames approach to semantic analysis”</article-title>
          . In: (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Fried</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.-O.</given-names>
            <surname>Östman</surname>
          </string-name>
          . “
          <article-title>Construction Grammar: A thumbnail sketch”</article-title>
          . In:
          <article-title>Construction Grammar in a cross-language perspective 1 (</article-title>
          <year>2004</year>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          University of Chicago Press,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>M. A. K. Halliday</surname>
            and
            <given-names>J. R.</given-names>
          </string-name>
          <string-name>
            <surname>Martin</surname>
          </string-name>
          .
          <article-title>Writing science: Literacy and discursive power</article-title>
          .
          <source>Routledge</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hervé-Gruyer</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Hervé-Gruyer</surname>
          </string-name>
          .
          <article-title>Miraculous abundance: One quarter acre, two French farmers, and enough food to feed the world</article-title>
          .
          <source>Chelsea Green Publishing</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>N.</given-names>
            <surname>Kitaev</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Klein</surname>
          </string-name>
          . “
          <article-title>Constituency parsing with a self-attentive encoder”</article-title>
          . In: arXiv preprint arXiv:
          <year>1805</year>
          .
          <volume>01052</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>O.</given-names>
            <surname>Martin</surname>
          </string-name>
          .
          <article-title>“French Intensive Gardening: A Retrospective”</article-title>
          . In: (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>McInnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Healy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Melville</surname>
          </string-name>
          . “Umap:
          <article-title>Uniform manifold approximation and projection for dimension reduction”</article-title>
          . In: arXiv preprint arXiv:
          <year>1802</year>
          .
          <volume>03426</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>F.</given-names>
            <surname>Moretti</surname>
          </string-name>
          .
          <article-title>Graphs, maps, trees: abstract models for a literary history</article-title>
          .
          <source>Verso</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Norvig</surname>
          </string-name>
          . “
          <article-title>How to write a spelling corrector”</article-title>
          . In: De: http://norvig. com/spell-correct.
          <source>html</source>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rayson</surname>
          </string-name>
          . “
          <article-title>From key words to key semantic domains”</article-title>
          .
          <source>In: International journal of corpus linguistics 13.4</source>
          (
          <issue>2008</issue>
          ), pp.
          <fpage>519</fpage>
          -
          <lpage>549</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Romary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Salmon-Alt</surname>
          </string-name>
          , and
          <string-name>
            <surname>G. Francopoulo.</surname>
          </string-name>
          “
          <article-title>Standards going concrete: from LMF to Morphalou”</article-title>
          .
          <source>In: The 20th International Conference on Computational Linguistics-COLING</source>
          <year>2004</year>
          .
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>R.</given-names>
            <surname>Smith</surname>
          </string-name>
          . “
          <article-title>An overview of the Tesseract OCR engine”</article-title>
          .
          <source>In: Ninth international conference on document analysis and recognition (ICDAR</source>
          <year>2007</year>
          ). Vol.
          <volume>2</volume>
          .
          <string-name>
            <surname>Ieee</surname>
          </string-name>
          .
          <year>2007</year>
          , pp.
          <fpage>629</fpage>
          -
          <lpage>633</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Terras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nyhan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Vanhoutte</surname>
          </string-name>
          .
          <article-title>Defining digital humanities: a reader</article-title>
          .
          <source>Routledge</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>R. van Trijp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Beuls</surname>
          </string-name>
          , and
          <string-name>
            <surname>P. Van Eecke.</surname>
          </string-name>
          “The FCG Editor:
          <article-title>An innovative environment for engineering computational construction grammars”</article-title>
          .
          <source>In: Plos One 17.6</source>
          (
          <issue>2022</issue>
          ),
          <year>e0269708</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wezel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bellon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Doré</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Francis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vallod</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>David</surname>
          </string-name>
          . “
          <article-title>Agroecology as a science, a movement and a practice. A review”</article-title>
          .
          <source>In: Agronomy for sustainable development 29</source>
          (
          <year>2009</year>
          ), pp.
          <fpage>503</fpage>
          -
          <lpage>515</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Yepes</surname>
          </string-name>
          . “
          <article-title>Publaynet: largest dataset ever for document layout analysis”</article-title>
          . In: 2019 International conference
          <article-title>on document analysis and recognition (ICDAR)</article-title>
          .
          <source>Ieee</source>
          .
          <year>2019</year>
          , pp.
          <fpage>1015</fpage>
          -
          <lpage>1022</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>