<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>I. Barbet);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Layout- and Activity-based Textbook Modeling for Automatic PDF Textbook Extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Élise Lincker</string-name>
          <email>elise.lincker@lecnam.net</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olivier Pons</string-name>
          <email>olivier.pons@lecnam.net</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Camille Guinaudeau</string-name>
          <email>guinaudeau@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Isabelle Barbet</string-name>
          <email>isabelle.barbet@lecnam.net</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jérôme Dupire</string-name>
          <email>jerome.dupire@lecnam.net</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Céline Hudelot</string-name>
          <email>celine.hudelot@centralesupelec.fr</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincent Mousseau</string-name>
          <email>vincent.mousseau@centralesupelec.fr</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Caroline Huron</string-name>
          <email>caroline.huron@cri-paris.org</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cedric, CNAM</institution>
          ,
          <addr-line>Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>JFLI</institution>
          ,
          <addr-line>CNRS, NII, Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Learning Planet Institute</institution>
          ,
          <addr-line>Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>MICS, CentraleSupélec, University Paris-Saclay</institution>
          ,
          <addr-line>Gif-sur-Yvette</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>No corresponding element, converted to attribute @n (&lt;pb/&gt;, &lt;div&gt;) &lt;note&gt; @type</institution>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>SEED, Inserm, University Paris Cité</institution>
          ,
          <addr-line>Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>University Paris-Saclay</institution>
          ,
          <addr-line>Gif-sur-Yvette</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Ensuring accessible textbooks for children with disabilities is essential for inclusive education. However, providing native accessibility for educational content remains a challenge. In the mean time, existing educational materials need to be adapted, for example by providing interactive versions to overcome dificulties caused by disabilities. In this context, our project aims to automatically adapt PDF textbooks to make them accessible to children with disabilities. The first step towards this adaptation involves extracting and structuring the content of textbooks. In this paper, we introduce textbook models, propose an automated extraction pipeline, and conduct preliminary experiments. Our textbook models are based on the various activities involved and provide layout and semantic information. They enable normalized and structured representations of educational content at both document and page levels, facilitating the automatic extraction process and the conversion to popular formats such as TEI and DocBook. In order to automatically extract PDF textbooks structure, our experiments, using a state-of-the-art multimodal transformer for a token classification task, demonstrate promising results. However, these experiments also highlight the dificulty of the task, especially cross-textbook collection generalization. Finally, we discuss the extraction pipeline and the directions of future work. textbook adaptation, inclusive education, modeling textbooks, modeling textbook pages, textbook extraction, interactive textbooks, digital textbooks, PDF processing</p>
      </abstract>
      <kwd-group>
        <kwd>Extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The use of e-learning environment and e-textbooks is growing in higher education, yet
paper textbooks remain prevalent in elementary and secondary schools in France. Despite the
nEvelop-O
availability of digital textbooks, the majority of them are not natively accessible due to their
ifxed layout and lack of reflowability, which prevents the adjustment of the page layout (font
size, line spacing, word spacing, letter spacing, etc.). To ensure inclusive education, there is a
pressing need to create accessible textbooks that allow children with disabilities to participate
in classroom activities. Inclusive textbooks should take into account students’ dificulties, while
preserving the content of the activities and their instructional intent.</p>
      <p>Some non-profit organizations have started to produce adapted digital textbooks by doing all
the transformations manually. For example, the association Le Cartable Fantastique1 provides
the Fantastiques Exercices, a collection of French exercises and their interactive version adapted
for children with Developmental Coordination Disorder (DCD). This neurodevelopmental
disorder is defined as an impairment in motor coordination which interferes with academic
achievement and daily life activities. At school, children with DCD struggle with handwriting.
More specifically, they do not automate the handwriting process and continue to pay attention
to letter tracing through their school life. In addition, their eye movement disorders may impede
their ability to read text that is not presented in an accessible format. Hence, for children with
DCD to succeed at school, textbooks must address their dificulties with handwriting and gaze
organization. Figure 1 shows an example of a “Fill-in the blank” exercise and its adaptation,
allowing children with DCD to complete the sentence by clicking on the correct answer, avoiding
the use of handwriting.</p>
      <p>6 ** Complète les phrases avec on ou ont.
a. Si … allait au cinéma ?
b. Ils … vu ce film dix fois.
c. … s’installe dans les fauteuils moelleux.
d. Mes parents … pris du pop-corn.
e. Les enfants … sursauté devant une scène du film.</p>
      <p>(a) Original exercise
(b) Adapted exercise</p>
      <p>Unfortunately, the large variety of textbooks and frequent renewal due to changes in the
curriculum make it challenging to adapt them manually. In this context, the MALIN project
(MAnuels scoLaires INclusifs, French for Inclusive textbooks) aims to automatically adapt PDF
textbooks for children with DCD or visual impairment, and in the long term, for other disabilities.
Adapted textbooks facilitate inclusive participation in class, providing students with disabilities
11 * Recopie chaque phrase en rétablissant la
withpotnhcetusaatimoneceodmumceatdiaonnsall’ecxoemntpelen.t as their classmates. We do not aim to enrich textbooks with
addaiutjioounrda’hluicdoannstelenjatrdoinrlepsrpoevtitsidceochpoenrssoontndaalniszéed assistance to students, but only modify the mode
of ina→.tcAeeurmjaocuarttdiin’oheunin, daallnasnlteàjalardigna,rleeslepetrtiotsiscièocmheoncsoocnhtodnoaanustép. uts or enabling connection to external tools (e.g.</p>
      <p>by generating alternative
texta-cthoe-tsépdeuepcahins.ynthesis or braille displays for blind and low vision readers). Since there are no
strubc.tàumrieddidvanesrslei otrnains iolsf’atsesxotitbàocoôktésdt'uhnahtocmomnet.ain suficient semantic information, we must start
fro mc. pendant le voyage avec l'homme le petit cochon</p>
      <p>matnegxetlbeopaoinks in PDF format. Hence, the first and fundamental step towards our goal is to
extrda.àctl'atrhriveéteedxatnbsolaogkasr’e sl'htroumcmteudreonanneddecsobnriqtueenst. This work presents our proposed approach for
thisauexpetrtiatccotciohonn pipeline and concentrates on textbook modeling. We examined a large set of</p>
      <sec id="sec-1-1">
        <title>1https://www.cartablefantastique.fr/</title>
        <p>French language study and mathematics textbooks used in elementary classroom to create a
template of a textbook, and introduce two complementary textbook models: one that represents
the textbook as a whole, and another that operates at the page level, which is essential for
the extraction process. This paper also comes as a position paper on the automatic extraction
task ahead. In particular, we have already conducted preliminary experiments on the token
classification task, using hybrid transformer architecture.</p>
        <p>Our main contributions are: (i) a model for structuring textbooks and textbook pages; (ii) a
general approach for PDF textbooks extraction, (iii) preliminary token classification experiments.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <sec id="sec-2-1">
        <title>2.1. Digital textbook envisioning</title>
        <p>
          There are numerous ways to interact with digital textbooks. The simplest digital textbook
includes the content of the paper textbook and can be flipped through like a traditional
ebook. More advanced versions are enriched with additional multimedia material or internal
functionalities such as hyperlinks. Researchers envision the future of textbooks as interactive
learning environments rather than traditional books and promote adaptive, personalized and
collaborative learning. Ou et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] propose a pedagogical framework for designing and
developing intelligent textbooks. Their framework is based on 5 key components: learners, text
content, visual content, assessment and AI technologies, and integrates 4 learning strategies:
multimedia learning, adaptive learning, personalized learning and collaborative learning. A
similar vision [
          <xref ref-type="bibr" rid="ref2 ref43">2</xref>
          ] takes advantage of Adaptive Classroom Environment (ACE) and Adaptive
Learning Recommandation System (ALRS) to encourage active dialog centered on structures
activities. These learning environments could be natively accessible, or at least more easily
adaptable. However, the implementation of such initiatives appears unlikely to happen soon in
France, since the production process of publishers relies on paper textbooks. One issue arises
from the fact that their digital versions are not accessible to children who have dificulties
accessing visual information (blind, visually impaired, visual and motor coordination disorders)
or writing (DCD, motor disabilities, autism specific disorders, attention disorders) because their
formats are incompatible with assistive technology tools [
          <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
          ]. These formats do not allow
students with disabilities to access information, process content, or perform educational tasks
efectively, eficiently and satisfactorily [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Textbook modeling</title>
        <p>
          Textbook modeling is a fundamental step common to all textbook segmentation research.
Three widely accessible markup schemes cover a range of applications, including textbooks:
HTML, DocBook2, and the Text Encoding Initiative Guidelines (TEI)3. Publishers and authors
customize or combine existing schemes or create their own, tailoring them to their specific
needs and objectives [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Hence, a basic but comprehensive textbook model [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] consistent with
the TEI standards has been formalized in order to develop an ontology for the textbook research
2https://docbook.org/
3https://tei-c.org/
discipline. Many papers in the field of textbook digitization and extraction tend to adopt a
generic structure (headings, sections, body text, paragraphs, etc.), whereas we aim to describe
all the types of instructional activities present in textbooks. In this direction, conceptual guides
for the elaboration of textbooks, such as [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], propose a standard structure and provide a range of
elements to be considered to develop “a good textbook”. The first pages of some textbooks also
supply information on how to successfully use them, including details on structure, diferent
types of activities, and helpful notes for the pupil or the supervising adult. All these various
aspects should be taken into account when modeling textbooks.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Automatic PDF document structure extraction</title>
        <p>
          2.3.1. Visually-rich document understanding
Emerging deep learning approaches have demonstrated significant potential in addressing
natural language processing (NLP) tasks, particularly those related to visually-rich document
understanding (VrDU). Most rely on the Transformer architecture, with its self-attention
mechanism [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], extended to multimodal data. VrDU models involve combining textual, spatial and
visual features to interpret scanned documents, PDFs and web pages. Thus, LayoutLM [
          <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
          ]
is built upon BERT’s architecture [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and incorporates additional 2-D position and visual
embeddings along with text embeddings. BROS [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] uses relative instead of absolute positions between
blocks, and DocFormer [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] introduces a multimodal cross-attention mechanism enabling the
exchange of information across modalities. TILT [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] relies on the T5 [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] architecture and
provides additional contextualized image embeddings at the input. However, most NLP models
are pre-trained and fine-tuned on English documents. While French pre-trained large language
models CamemBERT [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] and FlauBERT [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] have been very eficient in many NLP tasks, no
VrDU French model has been released. To tackle this issue, LiLT [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] allows to plug-and-play
any pre-trained RoBERTa-like model with a layout module and thus leverage layout features for
languages other than English. Besides, those models obtain state-of-the-art results on several
downstream VrDU tasks, such as form and receipt understanding, respectively on FUNDS [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]
and SROIE [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] benchmark datasets. Some studies focus on more complex document layout.
For example, Najem-Meyer et al. [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] compares text-only, visual and multimodal models as
well as 3 annotation standards for historical commentaries layout analysis. To the best of our
knowledge, there has been no specific research conducted on the application of deep learning
models, such as VrDU, to the analysis of textbook pages.
2.3.2. Automatic textbook processing
Research on textbook digitization, extraction and automatic analysis has been limited. Similar
to MALIN, the Intextbooks [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] system transforms a PDF textbook into an interactive digital
version, based on formal structure and hierarchy modeling. However, it targets university
textbooks, which are very diferent from elementary and secondary school textbooks. They
difer both in their format (linear, sober, no double columns, no illustrative images) and in
their content (more conceptual knowledge than training exercises). Moreover, the ultimate
objective of Intextbooks is to integrate smart interactive content by building enriched knowledge
graphs [
          <xref ref-type="bibr" rid="ref24 ref25 ref26 ref27">24, 25, 26, 27</xref>
          ], while we aim at improving accessibility. Another method relies on the
identification and cutting of target areas followed by OCR [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] has been proposed for
nonPDF electronic textbooks, where the text may not be easily extractable. This approach was
specifically developed for textbooks, as OCR is efective for standard texts but not for documents
with complex layout. Both this work and Intextbooks’ extraction step rely on rules, using layout
and table of content analysis, font styles, coordinates and distances, for example. Currently,
rule-based approaches prevail over machine learning approaches for textbook extraction.
2.3.3. Analogous approaches: PDF processing
Despite limited research on textbook extraction, multiple techniques have been proposed for
document segmentation or specific content identification in various PDF document types.
This includes books and scientific papers. Many tools also employ rule-based methods. For
instance, SEB [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] extracts books at both page level and document level, and [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] performs
PDF-to-ePUB conversion. Both structures aim to provide a more comfortable reading experience
by enabling a reflowed reading mode. More specifically applied to scholarly papers, hybrid
methods were proposed to extract and organize documents, using layout and style rules as well
as statistical machine learning algorithms [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]. Part of Semantic Scholar4, the Semantic Scholar
Open Data Platform [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] provides resources for scientific literature and the Semantic Reader
Project [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] intends to create an intelligent, interactive and accessible reading interface. Their
pipeline combines multiple PDF parsing tools, VILA [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ], LayoutParser [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ], as well as their
own libraries5. Other works aim to identify specific objects contained in scientific paper, such
as metadata of algorithms [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ] or mathematical statements and results [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ]. Experiments with
the extraction leverage style-based rules, computer vision and NLP techniques.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Approach: using layout and conceptual textbook models for automatic textbook extraction</title>
      <p>We propose activity-based textbook models with a mix of layout and conceptual features,
and a PDF textbook extraction pipeline built upon these models. Our models result from
the observation of dozens of elementary school textbooks widely used in French educational
institutions, and were inspired by existing models and guidelines for textbook creation.</p>
      <sec id="sec-3-1">
        <title>3.1. Textbook modeling: section and activity inventory</title>
        <p>Most French language study and mathematics textbooks share a basic structure. Textbooks are
divided into sub-disciplines (respectively grammar, conjugation, etc., and arithmetic, geometry,
etc.), then into learning themes containing several chapters. The bulk of the textbook to be
adapted for MALIN is found in the chapter pages. In the majority of textbooks, chapters are
structured around a double-page spread and include the following content blocks: the chapter
title, an introductory activity to the chapter, a lesson, and a series of exercises. An exercise
is then divided into several parts: it always has an instruction, often a statement, and it may</p>
        <sec id="sec-3-1-1">
          <title>4https://semanticscholar.org</title>
          <p>5ScienceParse https://github.com/allenai/science-parse, PaperMage https://github.com/allenai/papermage
contain: a number, a title, examples, hints and illustrations, as well as additional indicators such
as the level of dificulty. The introductory activity can assume diverse variations (exploratory
activity, revision activity, etc.), typically comprising a single statement and several sub-exercises.
In some books, opening activities introduce each theme. Depending on the publisher and the
collection, other information may be included, such as the skills that are involved in completing
an exercise, indicative activity headings, chapter numbers, a reminder of the discipline or the
theme in which the chapter is located, various indicators of the modality or interdisciplinarity
of an activity, etc. Revision pages are often added at the end of a theme or sub-discipline, with a
summary of the learning content from a set of chapters as well as application and integration
exercises. Finally, additional tool pages may be found at the beginning and end of the textbook:
foreword, directions for use, pedagogical approach, preface, table of contents, index, glossary,
bibliography, acknowledgements, pedagogical resources, or other various appendices.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Textbook modeling: XML model formalization</title>
        <p>From this inventory of sections, we have developed two models: a whole textbook model and a
textbook page model. We created our own Document Type Descriptors (DTDs) to formalize the
structure in XML format. This new model precisely aligns with the abstract syntax tree of the
document, encompassing all the necessary information for our project.</p>
        <p>The first model captures the encapsulation of blocks as described in Section 3.1. Any additional
element that is not part of the main content of a block is represented using an “indicator” tag.</p>
        <p>We use tokens as the smallest linguistic unit and group them into text segments. In fact,
textbooks have a distinct language compared to regular texts. In language study textbooks,
tokens are sometimes bound morphemes; in mathematics textbooks, they can be operators,
symbols or isolated letters. Besides, text segments usually correspond to grammatical sentences
(starting with a capital letter and ending with a strong punctuation mark), bloc titles or labels.
However, according to the nature of the activities, we may find ungrammatical and asemantic
sequences. Some examples are fill-in-the-blank words ( “c…bat”), sentences (“Manon a perdu
… chat.”) or operations (“4 × … = 8”), multiple-choice choices (“(son/sont)”), concatenated
words (“cirageâgégéantenfant”), scattered blocks of text (“est une fleur”, “la tulipe” ), list numbers
(“a.”,“b.”), etc. As a result, referring to segments instead of sentences is more appropriate to
describe such units of text. For the automation process, the use of text segments also allows
to easily infer roles through typography, whereas this is not suficiently discriminating at the
token level. Moreover, due to the short length of the text blocks in the analyzed grade-level
textbooks, the concept of paragraph is not relevant to our project.</p>
        <p>The scheme is also extended to lists and tables. In addition, two or more lists can be linked,
for instance in an exercise where the instruction asks to match items from diferent lists to
each other (e.g., “Match each subject with its predicate”). As needed, we can further refine our
model by incorporating sub-elements (e.g., choices in multiple-choice questions) and additional
semantic and morpho-syntactic attributes. This refinement is possible despite the fixed general
XML structure and layout attributes.</p>
        <p>To ensure consistency with previous research on textbook modeling and to guarantee
longterm usability, our pivot format can be converted to conform popular formats such as DocBook
and TEI. Table 1 shows the correspondence between our elements and their equivalents
according to the TEI Guidelines. The documents in the appendix depict the exercise featured in
Figure 1, in our format (Listing 1), converted to TEI (Listing 2) and to DocBook (Listing 3).</p>
        <p>While modeling the entire textbook seems to be suficient, the content extraction task is
performed at the page level. Therefore, we define a diferent scheme to model each textbook
page separately. In this second model, each token and segment tags will also be assigned
position and style attributes, reflecting the layout and formatting of the textbook page. In
addition, the nesting discipline &gt; learning theme &gt; chapter &gt; activity blocks within the chapter
is not straightforward. If chapter, theme or discipline titles appear on the page, these titles
(along with potential indicators) constitute a block on their own, separate from the activity
blocks on the page. As shown in Figure 2, this method enhances the visual representation of
the blocks on the page and each text segment can easily be associated with a role (chapter title,
lesson heading, lesson content, exercise number, exercise instruction, etc.) for the successful
completion of the automatic extraction task.</p>
        <p>chapter title
exploratory activity heading
illustration
exploratory activity
statement</p>
        <p>indicators
lesson heading
instructions
lesson</p>
        <p>title
number</p>
        <p>exercise section heading
instruction
statement
exercise
example
discipline</p>
        <p>title
hint</p>
        <p>6Note that some text segments (e.g. heading “J’écris” above the last exercise in Figure 2) are integrated into the
background images and will require OCR along with our PDF extraction process.
@join
&lt;pc join=“left”&gt; if
preceding token
element
@sep=“empty”)
Attributes
@type:
&lt;intro type=“open|
explo|revision...”&gt;
&lt;lesson type=
“vocabulary”&gt;</p>
        <p>TEI element(s)
&lt;pb/&gt;
&lt;lb/&gt;
&lt;div&gt;
&lt;head&gt;
&lt;figure&gt;
&lt;graphic/&gt;
&lt;seg&gt;
&lt;w&gt; or
&lt;pc&gt; or
&lt;number&gt;
&lt;table&gt;
No corresponding element, converted to an
attribute: &lt;cell role=“label”&gt;
&lt;head&gt;
&lt;row&gt;
&lt;cell&gt;
&lt;list&gt;
@cols @rows
@rend=“numbered|
lettered|bulleted|
inline”
&lt;item&gt; @n=“[item number]”
No corresponding element, converted to an
attribute: &lt;item n=“a.”&gt;
No corresponding element, converted to an
attribute: &lt;list rend=“inline bulleted”&gt;
&lt;list&gt;</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Automatic textbook extraction</title>
        <p>Due to the large quantity and diversity of textbook collections, an approach relying solely on
rules would not be appropriate for our purposes because it would require extensive manual
annotation. Our approach builds upon the model described in Section 3.2 and integrates both
rule-based and deep learning methods. Figure 3 depicts the pipeline we are developing to
convert a PDF textbook to its structured version.</p>
        <p>Each textbook is first parsed to an XML file in ALTO format by pdfalto 7 coupled with MuPDF8.
This combination of OpenSource tools enables the extraction of words along with their font style
and spatial coordinates, as well as images, in a well-organized structure. The extracted words
are tokenized and grouped into text segments using rules on font sizes and styles, character
types (numbers, symbols, punctuation marks) and spacing between tokens and characters.</p>
        <p>To reconstruct the textbook structure, segments must be labeled according to their role. In
order to reduce the manual annotation workload, we utilize an annotation interface designed
for MALIN. This web-based interface is supported by a TypeScript and Node.js back-end. The
core idea is to map the XML ALTO file to an HTML format, enabling visual representation
and annotation. Firstly, the annotator manually tags text segments with roles with just a few
clicks. Segments are then semi-automatically labeled based on their dominant font. Secondly,
the labeled segments are organized into higher-level categories that reflect the document’s
structure (e.g. lesson, exercise, etc.). This organization process leverages geometric features,
font types, font sizes, spacing and text patterns. At both stages, the results of the automatic
annotation are visually presented in HTML format, allowing for easy examination and potential
corrections. This ensures the accuracy of the annotation process.</p>
        <p>Once we have collected and processed enough textbook pages, we constitute a dataset to
train and evaluate deep learning models to achieve this annotation task automatically. Textbook
page structure extraction can be achieved through a token classification task. Preliminary
experiments conducted on a few French textbook pages are described in Section 3.4. New
pre-processed pages will then directly pass through the extractor.</p>
        <p>Upon token classification, textbook pages are formatted into our desired structure. Sections
are built using geometric features and logical sequencing. For example, an exercise number
following a statement indicates the beginning of a new exercise section. The process also</p>
        <sec id="sec-3-3-1">
          <title>7https://github.com/kermitt2/pdfalto</title>
          <p>8https://github.com/ArtifexSoftware/mupdf
involves identifying lists and tables, and filtering images in the PDF file. Activity illustrations
are matched with the corresponding sections using geometric features, while images used for
aesthetic purposes are removed.</p>
          <p>After textbook pages are structured, and possibly merged to our document-scale model or
its TEI or DocBook equivalent, the resulting data can be used for various artificial intelligence
applications in education. For our adaptation purposes and depending on the disability, we
would then be able to produce digital textbooks with a custom layout and interactive adaptations.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Preliminary token classification experiments</title>
        <p>These preliminary experiments correspond to the training phase of the extraction pipeline
depicted in Figure 3. This step involves training a deep-learning model on a token classification
task to automatically predict the role of each token in the document, enabling the reconstruction
of the document structure.
3.4.1. Experimental setup
We constructed a dataset of textbook pages extracted from 1 elementary grade French textbook
in PDF format. This constitutes a total of 167 pages, which are then split into 3 subdatasets:
training (70%), validation (10%) and test (20%). For evaluation purposes only, we selected
an additional 30 pages from a second textbook of the same collection, and 30 pages from a
third textbook of a diferent collection. Each token is annotated with a coarse-grained page
region label among: discipline, chapter, heading, introductory activity, lesson, exercise, page
number. Future experiments will go further by introducing fine-grained labels. Table 2 lists the
equivalences between coarse- and fine-grained page region classes.</p>
        <p>
          In our first research on the classification of French textbook exercises according to their
adaptation to DCD with multimodal transformers [
          <xref ref-type="bibr" rid="ref39 ref40">39, 40</xref>
          ], we demonstrated the importance of
layout and vision modalities along with French educational language in textbook understanding.
We therefore take advantage of recently introduced LiLT, combined with CamemBERT prior
ifne-tuned on textbooks and reading materials 9, to obtain a LayoutLM-like model for educational
French. We use the BASE architecture for both pre-trained models. Fine-tuning on the token
classification task is completed at 10-15 epochs due to early stopping, with a batch size of 8.
The initial learning rate is set to 1e-5. We use Adam optimizer and cross-entropy loss. Results
on the test set are obtained with the fine-tuned model performing the best on the validation set.
        </p>
        <p>
          Considering the supported limits of the models, we set the maximum input length to 512.
However, 60% of the pages are longer. These pages are encoded in 2 overlapping segments: once
by truncating the end of the document to the maximum length, and a second time by truncating
the beginning. Inspired by the sliding-window approach [
          <xref ref-type="bibr" rid="ref42">42</xref>
          ], this solution allows to cover all
the document, while maximizing the window size which also maximizes the context. During
evaluation, predictions are generated for each segment and aligned to the entire textbook page.
If the overlapping section between segments results in diferent predictions 10, the section is
re-encoded with additional context tokens on both the left and right sides and passed through
the model. The three predictions for the overlapping part are merged using a majority vote to
obtain a single prediction. This approach ensures that the model’s predictions are accurately
consolidated for the entire page, even when there are variations in the overlapping segment.
3.4.2. Results and discussion
We report the accuracy and macro-F-measure of the token classification task in Table 3. The
evaluation was performed on 3 textbooks, from both familiar and unfamiliar collections.
        </p>
        <p>Our model outperforms the majority class baseline on all test sets. The intra-collection
performance is very high. Since layout is the same for textbooks of the same collection, these
results highlight the significance of layout features for the model.</p>
        <p>
          The performance scores are lower for the 3rd textbook. Upon closer examination of the
predictions and comparing the pages with those from the textbook used for training, it becomes
evident that the errors are primarily due to layout diferences. Specifically, chapters in the
training collection are typically structured as shown in the example in Figure 2. However,
in the new textbook, exercises can be located before the corresponding lesson, which is
consistently at the bottom of the page. Besides, the distinction between introductory activities
9CamemBERT-BASE’s masked language model is fine-tuned on the following educational texts: pages from 4
French textbooks (apart from the pages of the validation and test subsets), 1293 Fantastiques Exercices, and the 79
original reading texts from the parallel corpus Alector [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ].
        </p>
        <p>10Among the overlapping segments, prediction diferences occur in 1/3 of the pages and impact an average of 5%
of the tokens within those segments.
exercises and actual exercises within a chapter is not clearly defined. As a result, some parts
of lessons are wrongly predicted as exercises, and exercises within exploratory activities are
occasionally misclassified as lessons or exercises. With the aim of developing a system with
a high generalizability, training and evaluation sets must comprise more books from various
collections.</p>
        <p>These first experiments do not fully reflect the complexity of the task, since we use
coarsegrained labels. For adaptation purposes, it will be necessary to provide a more detailed structure
for each activity. Nevertheless, the results obtained with coarse-grained labels already point to
a range of adaptation levels, depending on the nature of the blocks identified (e.g. lesson vs.
exercise). On one hand, lesson adaptation for children with DCD or any other dyslexia-related
disorder is already achievable, as it involves implementing standard accessibility modifications
such as adjusting font, size, spacing and colors to enhance readability. On the other hand,
activities that necessitate a shift in the mode of interaction require more in-depth extraction,
and further processing to accurately identify11 and apply this shift. Regarding the extraction
task and given the results obtained in this paper, we can consider a 2-step token classification.
Another limitation is that some components do not appear in all textbook collections. For
example, pages comprising our dataset do not explicitly indicate the learning theme (this
information is exclusive to the index), whereas some textbooks mention it on each chapter page.
Finally, our experiments cover only French language study textbooks. Textbooks of diferent
subjects may present certain specificities not only in layout but also in content. When applying
our models to numerous existing collections, it is imperative to account for these variations in
layout and semantics and ensure suficient generalizability.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this paper, we introduce our approach for automatic textbook structure and content extraction
using activity-based textbook models. Our models not only provide layout and conceptual
information, but also support the representation of an entire textbook according to widespread
standards. We also report preliminary results on the token classification task to automatically
identify all the components of a textbook page. These results are promising but reflect the
dificulty of the task: generalization to various collections, whose content and layout are very
diferent. Future work will address the progression of this automatic extraction task, using new
multimodal transformer-based methods and going deeper into fine-grained labeling. Eventually,
we will cover the implementation of the whole pipeline to convert a PDF textbook into a
structured version according to our models.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was supported by the ANR-21-CE38-0014 MALIN project.</p>
      <p>
        11Our previous work introduces a classification task that aims to classify French language study exercises based
on their adaptation type for children with DCD. [
        <xref ref-type="bibr" rid="ref39 ref40">39, 40</xref>
        ]
Documents depict the exercise featured in Figure 1 in our format, TEI and DocBook.
The representation has been simplified in Listing 1 by omitting position (@xmin, @ymin,
@xmax, @ymax), spacing (@space, @spnext) and font style (@font) attributes. Unique id
(@id/@ID) attributes are omitted in all documents.
      </p>
      <p>Listing 1: Our format
&lt;div type="exercise" n="6", difficulty="**"&gt;
&lt;div type="instruction"&gt;
&lt;seg&gt;
&lt;w&gt;Complète&lt;/w&gt;
&lt;w&gt;les&lt;/w&gt;
&lt;w&gt;phrases&lt;/w&gt;
&lt;w&gt;avec&lt;/w&gt;
&lt;w&gt;on&lt;/w&gt;
&lt;w&gt;ou&lt;/w&gt;
&lt;w&gt;ont&lt;/w&gt;
&lt;pc join="left"&gt;.&lt;/pc&gt;
&lt;/seg&gt;
&lt;/div&gt;
&lt;lb/&gt;
&lt;div type="statement"&gt;
&lt;list rend="lettered"&gt;
&lt;item n="a."&gt;
&lt;seg&gt;
&lt;w&gt;Si&lt;/w&gt;
&lt;pc&gt;…&lt;/pc&gt;
&lt;w&gt;allait&lt;/w&gt;
&lt;w&gt;au&lt;/w&gt;
&lt;w&gt;cinéma&lt;/w&gt;
&lt;pc&gt;?&lt;/pc&gt;
&lt;/seg&gt;
&lt;/item&gt;
&lt;lb/&gt;
&lt;item n="b."&gt;
&lt;seg&gt;
&lt;w&gt;Ils&lt;/w&gt;
&lt;pc&gt;…&lt;/pc&gt;
&lt;w&gt;vu&lt;/w&gt;
&lt;w&gt;ce&lt;/w&gt;
&lt;w&gt;film&lt;/w&gt;
&lt;w&gt;dix&lt;/w&gt;
&lt;w&gt;fois&lt;/w&gt;
&lt;pc join="left"&gt;.&lt;/pc&gt;
&lt;/seg&gt;
&lt;/item&gt;
&lt;lb/&gt;
&lt;item n="c."&gt;
&lt;seg&gt;
&lt;pc&gt;…&lt;/pc&gt;
&lt;w&gt;s’&lt;/w&gt;
&lt;w&gt;installe&lt;/w&gt;
&lt;w&gt;dans&lt;/w&gt;
&lt;w&gt;les&lt;/w&gt;
&lt;w&gt;fauteuils&lt;/w&gt;
&lt;w&gt;moelleux&lt;/w&gt;
&lt;pc join="left"&gt;.&lt;/pc&gt;
&lt;/seg&gt;
&lt;/item&gt;
&lt;lb/&gt;
&lt;item n="d."&gt;
&lt;seg&gt;
&lt;w&gt;Mes&lt;/w&gt;
&lt;w&gt;parents&lt;/w&gt;
&lt;pc&gt;…&lt;/pc&gt;
&lt;w&gt;pris&lt;/w&gt;
&lt;w&gt;du&lt;/w&gt;
&lt;w&gt;pop-corn&lt;/w&gt;
&lt;pc join="left"&gt;.&lt;/pc&gt;
&lt;/seg&gt;
&lt;/item&gt;
&lt;lb/&gt;
&lt;item n="e."&gt;
&lt;seg&gt;
&lt;w&gt;Les&lt;/w&gt;
&lt;w&gt;enfants&lt;/w&gt;
&lt;pc&gt;…&lt;/pc&gt;
&lt;w&gt;sursauté&lt;/w&gt;
&lt;w&gt;devant&lt;/w&gt;
&lt;w&gt;une&lt;/w&gt;
&lt;w&gt;scène&lt;/w&gt;
&lt;w&gt;de&lt;/w&gt;
&lt;w&gt;film&lt;/w&gt;
&lt;pc join="left"&gt;.&lt;/pc&gt;
&lt;/seg&gt;
&lt;/item&gt;
&lt;/list&gt;
&lt;/div&gt;
&lt;/div&gt;</p>
      <p>Listing 3: DocBook
&lt;section role="exercise"&gt;
&lt;section role="instruction"&gt;</p>
      <p>&lt;para&gt;Complète les phrases avec on ou ont.&lt;/para&gt;
&lt;/section&gt;
&lt;section role="statement"&gt;
&lt;orderedlist numeration="loweralpha"&gt;
&lt;listitem&gt;</p>
      <p>&lt;para&gt;Si … allait au cinéma ?&lt;/para&gt;
&lt;/listitem&gt;
&lt;listitem&gt;</p>
      <p>&lt;para&gt;Ils … vu ce film dix fois.&lt;/para&gt;
&lt;/listitem&gt;</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Goel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Joyner</surname>
          </string-name>
          ,
          <article-title>Towards a Pedagogical Framework for Designing and Developing iTextbooks</article-title>
          ,
          <source>in: Proceedings of the 4th International Workshop on Inteligent Textbooks, 23rd International Conference on Artificial Intelligence in Education</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ritter</surname>
          </string-name>
          , J. Fisher,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Finocchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hausmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fancsali</surname>
          </string-name>
          ,
          <article-title>What's a Textbook? Envisioning the 21st Century K-12 Text.</article-title>
          ,
          <source>in: Proceedings of the 1st Workshop on Inteligent Textbooks, 20th International Conference on Artificial Intelligence in Education</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Castillan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lemarié</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mojahid</surname>
          </string-name>
          ,
          <article-title>Numérique, handicap visuel et accessibilité des apprentissages. Contenus pédagogiques numériques: quelle accessibilité pour les élèves présentant une déficience visuelle?</article-title>
          ,
          <source>Éducation &amp; Formation</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Castillan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lemarié</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mojahid</surname>
          </string-name>
          ,
          <string-name>
            <surname>L'</surname>
          </string-name>
          <article-title>accessibilité des manuels scolaires numériques: l'exemple suédois, entre édition adaptée et édition inclusive, La nouvelle revue-Éducation et société inclusives (</article-title>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Ketterlin-Geller</surname>
          </string-name>
          , G. Tindal,
          <article-title>Embedded technology: Current and future practices for increasing accessibility for all students</article-title>
          ,
          <source>Journal of special education technology 22</source>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rahtz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Walsh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Burnard</surname>
          </string-name>
          ,
          <article-title>A unified model for text markup: TEI, DocBook, and beyond</article-title>
          ,
          <source>Proceedings of XML Europe</source>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.-L.</given-names>
            <surname>Stahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hennicke</surname>
          </string-name>
          , E. W. De Luca,
          <article-title>Using TEI for textbook research</article-title>
          ,
          <source>in: Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F</given-names>
            <surname>.-M. Gérard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Roegiers</surname>
          </string-name>
          ,
          <article-title>Des manuels scolaires pour apprendre: concevoir, évaluer</article-title>
          , utiliser, De Boeck Supérieur,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          , Attention is All you Need,
          <source>in: Proceedings of the 21st Conference on Neural Information Processing Systems</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Zhou, LayoutLM: Pre-training of Text and Layout for Document Image Understanding</article-title>
          ,
          <source>in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Florencio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Che,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Zhou,
          <article-title>LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th</article-title>
          <source>International Joint Conference on Natural Language Processing</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <article-title>LayoutLMv3: Pre-training for document ai with unified text and image masking</article-title>
          ,
          <source>in: Proceedings of the 30th ACM International Conference on Multimedia</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hwang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nam</surname>
          </string-name>
          , S. Park,
          <article-title>Bros: A pre-trained language model focusing on text and layout for better key information extraction from documents</article-title>
          ,
          <source>in: Proceedings of the 36th AAAI Conference on Artificial Intelligence</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Appalaraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jasani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. U.</given-names>
            <surname>Kota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Manmatha</surname>
          </string-name>
          , Docformer:
          <article-title>End-to-end transformer for document understanding</article-title>
          ,
          <source>in: Proceedings of the 18th IEEE International Conference on Computer Vision</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Powalski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Borchmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurkiewicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Dwojak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pietruszka</surname>
          </string-name>
          , G. Palka,
          <article-title>Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer</article-title>
          ,
          <source>in: Proceedings of 16th International Conference on Document Analysis and Recognition</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the limits of transfer learning with a unified text-to-text transformer</article-title>
          ,
          <source>The Journal of Machine Learning Research</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Muller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J. Ortiz</given-names>
            <surname>Suárez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dupont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Romary</surname>
          </string-name>
          , E. de la Clergerie,
          <string-name>
            <given-names>D.</given-names>
            <surname>Seddah</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Sagot,</surname>
          </string-name>
          <article-title>CamemBERT: a Tasty French Language Model</article-title>
          ,
          <source>in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vial</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Frej</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Segonne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Coavoux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lecouteux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Allauzen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Crabbé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Besacier</surname>
          </string-name>
          , D. Schwab,
          <article-title>FlauBERT: Unsupervised Language Model Pre-training for French</article-title>
          ,
          <source>in: Proceedings of the 12th Language Resources and Evaluation Conference</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jin</surname>
          </string-name>
          , K. Ding,
          <article-title>LiLT: A Simple yet Efective Language-Independent Layout Transformer for Structured Document Understanding</article-title>
          ,
          <source>in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>G.</given-names>
            <surname>Jaume</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. K.</given-names>
            <surname>Ekenel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Thiran</surname>
          </string-name>
          ,
          <article-title>Funsd: A dataset for form understanding in noisy scanned documents</article-title>
          ,
          <source>in: Proceedings of the 15th International Conference on Document Analysis and Recognition Workshops</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S.</given-names>
            <surname>Najem-Meyer</surname>
          </string-name>
          , M. Romanello,
          <article-title>Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches</article-title>
          ,
          <source>in: Proceedings of the Computational Humanities Research Conference</source>
          <year>2022</year>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>I.</given-names>
            <surname>Alpizar-Chacon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. van der</given-names>
            <surname>Hart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. S.</given-names>
            <surname>Wiersma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. S.</given-names>
            <surname>Theunissen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sosnovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Brusilovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Baraniuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <article-title>Transformation of PDF textbooks into intelligent educational resources</article-title>
          ,
          <source>in: Proceedings of the 2nd International Workshop on Inteligent Textbooks, 21st International Conference on Artificial Intelligence in Education</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>I.</given-names>
            <surname>Alpizar-Chacon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sosnovsky</surname>
          </string-name>
          ,
          <article-title>Order out of chaos: Construction of knowledge models from pdf textbooks</article-title>
          ,
          <source>in: Proceedings of the ACM Symposium on Document Engineering</source>
          <year>2020</year>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>I.</given-names>
            <surname>Alpizar-Chacon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sosnovsky</surname>
          </string-name>
          ,
          <article-title>Knowledge models from PDF textbooks</article-title>
          ,
          <source>New Review of Hypermedia and Multimedia</source>
          <volume>27</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>I.</given-names>
            <surname>Alpizar-Chacon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Barria-Pineda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Akhuseyinoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sosnovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Brusilovsky</surname>
          </string-name>
          ,
          <article-title>Integrating textbooks with smart interactive content for learning programming</article-title>
          ,
          <source>in: Proceedings of the 3rd International Workshop on Inteligent Textbooks, 22nd International Conference on Artificial Intelligence in Education</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>I.</given-names>
            <surname>Alpizar-Chacon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sosnovsky</surname>
          </string-name>
          ,
          <article-title>What's in an index: Extracting domain-specific knowledge graphs from textbooks</article-title>
          ,
          <source>in: Proceedings of the ACM Web Conference</source>
          <year>2022</year>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Z.-M. Deng</surname>
            , M.-
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>C.-F.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          ,
          <source>Digitalization of Electronic Textbook Based on OPENCV, in: Proceedings of the International Conference on Machine Learning and Cybernetics</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Structure extraction from PDF-based book documents</article-title>
          ,
          <source>in: Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>S.</given-names>
            <surname>Marinai</surname>
          </string-name>
          , E. Marino, G. Soda,
          <article-title>Conversion of PDF books in ePub format</article-title>
          ,
          <source>in: Proceedings of the 11th International Conference on Document Analysis and Recognition</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tuarob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Giles</surname>
          </string-name>
          ,
          <article-title>A hybrid approach to discover semantic hierarchical sections in scholarly documents</article-title>
          ,
          <source>in: Proceedings of the 13th International Conference on Document Analysis and Recognition</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kinney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Anastasiades</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Authur</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bragg</surname>
          </string-name>
          , et al.,
          <source>The Semantic Scholar Open Data Platform, arXiv preprint arXiv:2301.10140</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Head</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bragg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Trier</surname>
          </string-name>
          , et al.,
          <source>The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces</source>
          ,
          <source>arXiv preprint arXiv:2303.14334</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kuehl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Weld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Downey</surname>
          </string-name>
          , VILA:
          <article-title>Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups, Transactions of the Association for Computational Linguistics (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C. G.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carlson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>LayoutParser: A unified toolkit for deep learning based document image analysis</article-title>
          ,
          <source>in: Proceedings of the 16th International Conference on Document Analysis and Recognition</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>I.</given-names>
            <surname>Safder</surname>
          </string-name>
          , S.-U. Hassan,
          <string-name>
            <given-names>A.</given-names>
            <surname>Visvizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Noraset</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nawaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tuarob</surname>
          </string-name>
          ,
          <article-title>Deep learning-based extraction of algorithmic metadata in full-text scholarly documents</article-title>
          ,
          <source>Information processing &amp; management (</source>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pluvinage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Senellart</surname>
          </string-name>
          ,
          <article-title>Towards extraction of theorems and proofs in scholarly articles</article-title>
          ,
          <source>in: Proceedings of the 21st ACM Symposium on Document Engineering</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>S.</given-names>
            <surname>Aminta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Helbling</surname>
          </string-name>
          ,
          <source>Outils pour le français CE1</source>
          , Magnard,
          <year>2019</year>
          . URL: https://www. magnard.fr/livre/9782210505377-outils
          <article-title>-pour-le-francais-</article-title>
          <string-name>
            <surname>ce1-</surname>
          </string-name>
          2019
          <string-name>
            <surname>-</surname>
          </string-name>
          manuel-eleve.
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>E.</given-names>
            <surname>Lincker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guinaudeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pons</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dupire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hudelot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mousseau</surname>
          </string-name>
          , I. Barbet,
          <string-name>
            <given-names>C.</given-names>
            <surname>Huron</surname>
          </string-name>
          ,
          <article-title>Classification automatique de données déséquilibrées et bruitées : application aux exercices de manuels scolaires</article-title>
          ,
          <source>in: Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>E.</given-names>
            <surname>Lincker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guinaudeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pons</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dupire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hudelot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mousseau</surname>
          </string-name>
          , I. Barbet,
          <string-name>
            <given-names>C.</given-names>
            <surname>Huron</surname>
          </string-name>
          ,
          <article-title>Noisy and Unbalanced Multimodal Document Classification: Textbook Exercises as a Use Case</article-title>
          ,
          <source>in: Proceedings of the 20th International Conference on Content-based Multimedia Indexing</source>
          (to appear),
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>N.</given-names>
            <surname>Gala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Javourey-Drevet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>François</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <article-title>Alector: A Parallel Corpus of Simplified French Texts with Alignments of Misreadings by Poor and Dyslexic Readers</article-title>
          ,
          <source>in: Proceedings of the 12th Language Resources and Evaluation for Language Technologies</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nallapati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <article-title>Multi-passage bert: A globally normalized bert model for open-domain question answering</article-title>
          , arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>08167</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <article-title>Listing 2: TEI</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>