Transformation of PDF Textbooks into
           Intelligent Educational Resources

Isaac Alpizar-Chacon[0000−0002−6931−9787] , Max van der Hart, Zef S. Wiersma,
     Lorenzo S.J. Theunissen, and Sergey Sosnovsky[0000−0001−8023−1770]

                    Utrecht University, Utrecht, The Netherlands
              i.alpizarchacon@uu.nl, m.vanderhart@students.uu.nl,
                          z.s.wiersma@students.uu.nl,
              l.s.j.theunissen@students.uu.nl,s.a.sosnovsky@uu.nl


        Abstract. The paper presents Intextbooks - the system for automated
        conversion of PDF-based textbooks into intelligent educational Web re-
        sources. The papers focuses on the new component of Intextbooks re-
        sponsible for transformation of PDF-based content into semantically-
        annotated HTML/CSS. The architecture of the system, the design of
        the client application rendering resulting textbooks and a short valida-
        tion experiment demonstrating the quality of the transformation work-
        flow are presented.

        Keywords: Digital textbooks · Interactive textbooks · Intelligent text-
        books · Adaptive textbooks · Modeling textbooks · Educational resources
        · PDF to HTML.


1     Introduction
1
 Digital textbooks have become a standard medium for distributing educational
content, especially, online. The popularity of digital textbooks has been on the
rise. For example, according to DeNoyelles’s and Raible’s report, the number
of students that used digital textbooks at least once in their college studies in-
creased from 44% in 2012 to 66% in 2016 [11]. Some of these textbooks come
equipped with additional intelligent services built around them to support en-
hanced search [24], easier navigation [5], interactive content [14], and, ultimately,
better learning [23]. However, most digital textbooks exist as mere digital copies
of their printed counterparts. One of the reasons for an insufficient number of
intelligent textbooks is the amount of efforts and expertise necessary to create
them. Linking relevant parts of a textbook content to each other, to external
interactive resources and to the elements of domain knowledge are tasks that
traditionally require manual input from domain and pedagogy experts, thus
preventing development and deployment of intelligent textbooks at scale.
1
    Copyright c 2020 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
2       I. Alpizar-Chacon et al.

    In our recent work, we have made initial steps to address this problem by
developing an approach towards automated extraction of machine-readable do-
main models from PDF textbooks. This approach takes into account common
patterns of textbook formatting and organization and formalizes them as an ex-
tensible set of rules that gradually process the raw content of a textbook, elicit
its underlying structural and formatting elements and, finally, harvest the se-
mantic model of a textbook [3]. We focus mainly on PDF as the most common
(and more challenging) format for distributing digital textbooks. Additionally,
we have used such models to create semantic bridges allowing us to connect the
textbooks to one another and to DBpedia [1]. This opens up a range of interest-
ing possibilities for supporting development of intelligent and adaptive textbooks
at scale. One potential application of this approach has been presented in [2],
where we have implemented the Interlingua application capable to link sections
of relevant textbooks across multiple languages to support students studying in
a foreign tongue. However, while our approach is capable to extract a semantic
model of a textbook, link it to other models and essentially provide a backbone
for implementing intelligent services, it has been missing an important com-
ponent that performs the actual conversion of static PDF files into interactive
resources that can be published on the Web.

     In this paper, we fill this gap and present Intextbooks (Intelligent textbooks) -
the system that can perform the complete transformation of PDF textbooks into
online intelligent educational resources. After extracting a semantic model from a
PDF textbook, it converts it into an HTML/CSS representation with an enriched
fine-grained DOM (Document Object Model)2 . Every content/layout/structure
element of the textbook (words, lines, paragraphs, pages, chapters, index entries,
etc.) are uniquely identifiable within its DOM. As a result, this implementation
is absolutely flexible in terms of potential interactivity as virtually any object
of a textbook (from a chapter to a keyword) can become an object of targeted
interaction. Moreover, each and every element of a textbook’s DOM is also iden-
tifiable within the semantic model extracted from the textbook. As a result,
the entire textbook becomes an integrated resource where content elements and
pieces of domain knowledge are interlinked on both the presentation and the
knowledge levels. Such an organization enables various kinds of semantically-
and adaptively-enhanced interaction with the textbook content.

    The rest of this paper is structured as follows. Section 2 gives an overview of
related work. Section 3 provides the details of the Intextbooks system. Section 4
presents a validation experiment conducted to test the PDF to HTML conversion
component of the system. Finally, section 5 concludes the paper with a discussion
and a description of future work.


2
    https://www.w3.org/DOM/DOMTR
      Transformation of PDF Textbooks into Intelligent Educational Resources     3

2     Related work

2.1     PDF to HTML conversion

Basic conversion from PDF to HTML documents is a well-known task. In 2003,
Rahman and Alam [22] discussed the use of Document Image Analysis techniques
to identify the layout and basic components (graphics, text, tables) of PDF
documents as the first step to produce HTML outputs. Marinai, Marino and
Soda [20] presented a system for converting PDF books to the ePub format,
which produces XHTML3 files. They focused on TOC (TOC) identification and
analysis, as well as identifications of notes and illustrations. Those two analyses
allow for generating more structured XHTML files. Currently, there are multiple
online tools4 and libraries5 that allows easy and quick PDF to HTML conversion.
The main limitation of existing tools is that they only focus on producing an
HTML with an accurate visual representation of the original PDF document
without any mechanism that preserves semantic information about its content
or structure.


2.2     Interaction with textbooks

A good interaction design is an important factor affecting user acceptance of
digital textbooks [8]. There are multiple ways in which a digital textbook can
support a reader’s interaction with its content. They range from more standard
tools such as textual search and internal hyperlinks to social interactive inter-
faces such as bookmarking, tagging and commenting [12, 15, 16, 19], to intelli-
gent services such as semantic search [13] and adaptive navigation support [4].
Semantically-enriched textbooks are capable to support meaningful linking and
rearrangement of content [17], semantic search and retrieval of relevant learning
objects [21] and even targeted inquiry, exploration and comparison of important
notions in the domain [7]. Adaptive textbooks trace progress of their readers and
maintain composite models of their knowledge to provide personalized content
or interaction. For example, in [5], an online textbook with adaptive naviga-
tion support annotates links to its resources with indicators to inform students
about the individual educational value of the linked resources. Other examples
of adaptation can be found in [18, 9, 25].


3     The Intextbooks system

In this section we describe Intextbooks, a system that transforms PDF textbooks
into interactive intelligent educational resources.
3
  https://www.w3.org/TR/xhtml1/
4
  https://www.pdf2html.org/, https://pdf.io/pdf2html/, etc.
5
  see Section 3.2
4        I. Alpizar-Chacon et al.

3.1     Architecture

The Intextbooks system consists of two main groups of components. The offline
components perform the tasks of textbook modeling and conversion to HTML,
while the online components support students’ interaction with the textbooks.
Figure 1 presents the overall architecture of the system.
    The offline components take a PDF textbook and, first, extract its semantic
model. The resulting model is represented as an RDF-enriched TEI (Text En-
coding Initiative)6 document. Each word, line, text fragment, page, (sub)section,
index term, TOC entry, etc. is recognized individually in this model. Then, the
PDF textbook is converted into an HTML representation. As the last step, TEI
and HTML representations are synchronized, meaning all elements of the TEI
model are connected to the DOM elements of the HTML version of the text-
book. After that, both the TEI and the synchronized HTML representation of
the textbook are stored in the central repository. Together, they play the roles
of domain and content models to enable semantic and adaptive access to the
textbook content.
    The online Web-reader presents processed textbooks to students. Every time
a student requests a textbook, the reader displays the synchronized HTML rep-
resentation of the textbook and supports various kinds of interaction with it.
Additionally, the adaptation engine uses a student model and activity logs to
generate specific content and interactions for each student (e.g., tailored navi-
gational aid). Thanks to the extracted model of the textbook, the web interface
is aware of various elements of the textbook semantics: the precise beginning
and ending of each (sub)section, the relevant terms in every page (based on the
index terms) and additional information associated with them (definitions, links
to external resources), etc.


3.2     Main components

Textbook model extractor Our textbook model extractor is a rule-based
component that first extracts a semantic model of a textbook using its symbolic,
formatting, and structural elements. Then, the model is enriched with additional
semantic information using DBpedia and the identified index terms in the text-
book. Finally, the enriched semantic model is serialized as an TEI/XML file.
Altogether, 55 unique rules have been defined to handle the extraction process.
The implementation details of this approach are described in our previous work
[3, 1].
     The resulting TEI textbook model groups all the information extracted from
the textbook into three categories: structure, content, and domain knowledge.
The structure section contains the hierarchical structure of (sub)sections of the
textbook, according to the TOC. The content section provides for every section
its textual content organized according to basic textual elements: words, lines,
fragments, pages, and subsections. All these elements have unique IDs. Finally,
6
    https://tei-c.org/
Transformation of PDF Textbooks into Intelligent Educational Resources   5


                     Fig. 1. Intextbooks architecture
6       I. Alpizar-Chacon et al.

the domain knowledge section provides important terms in the textbook ac-
cording to its Index. Each term is linked to the (sub)sections and pages where
it appears, and it can have additional semantic information such as definition,
external resource, classification categories, and related terms. This semantic in-
formation is incorporated into the TEI model using RDFa7 attributes, which
create connections to the Linked Open Data cloud8 .

PDF to HTML converter This component converts a PDF textbook into
an HTML representation that preserves its layout and content. Preserving the
layout of a PDF document is a complex task. A PDF document contains thou-
sands of low-level objects grouped into three layers – text, bitmap images, and
vector graphics – [6]). Several open libraries have been developed to perform the
low-level processing of these PDF primitives and converting them into an HTML
representation. We considered four such libraries: pdf2htmlEX9 , PDFMiner10 ,
pdf2html11 , and Xpdf12 .
    While comparing possible libraries, we looked for different properties. The
most important was the ability of the conversion library to preserve the look of
the PDF in the resulting HTML. Therefore, our search was mostly limited to
geometrically-based conversion libraries. Another critical factor was the ability
to parse the HTML in a structured order to synchronize the HTML with the
textbook semantic model. Other factors that were considered were Linux sup-
port, performance, and scalability. After analyzing the candidate libraries, we
have decided to use pdf2htmlEX.
    This library is no longer under active development, but it has an extensive
documentation allowing its enrichment. It preserves the layout of the PDF text-
book perfectly across different types of documents. It can be ran as a standalone
process and is quite fast compared to other tools. Furthermore, the HTML out-
put is structured: each page has its own id and is divided into lines. One downside
of the library is that tables, graphs, figures, and vector lines are all grouped into
one static background image that is loaded for an entire page, which makes it
harder to recognize individual elements.

TEI-HTML synchronizer This component modifies the output HTML rep-
resentation of a textbook to create a fine-grained DOM structure, which is syn-
chronized to the textual elements from the semantic model of the textbook. This
process is not straightforward since HTML’s structure exists to preserve the lay-
out, but it does not correspond to textual or semantic elements. For example, in
the HTML produced by the conversion library, multiple words may share a single
span element. After the synchronization process, the final HTML for a textbook
7
   http://rdfa.info/
8
   https://lod-cloud.net/
 9
   https://github.com/coolwanglu/pdf2htmlEX
10
   https://pypi.org/project/pdfminer/
11
   https://github.com/mgedmin/pdf2html
12
   https://www.xpdfreader.com/pdftohtml-man.html
      Transformation of PDF Textbooks into Intelligent Educational Resources      7

has a DOM structure that identifies words, lines, and paragraphs with the same
IDs from their corresponding elements in the TEI document. The rest of this
subsection explains our matching algorithm to synchronize both representations
of a textbook.

Internal Representation. The matching takes as an input internal representations
for both the TEI model and the HTML file. These representations consist of lists
of pages, each of which contains lines (for the HTML) or paragraphs with lines
(for the TEI), which themselves contain lists of words. For HTML lines, we also
keep extra information, such as their X- and Y-position on a page and font size,
which are necessary for the consequent matching. For the words, we keep track
of whether they have already been matched. It is important to note that the
HTML text is split into DOM’s TextNodes. These may contain characters from
different words (split by a space), parts of a single word, an entire word, or only
whitespaces. Different parts of a word in the HTML might also be at different
levels, split by spans. Therefore, we keep a list of the TextNodes that all together
form a word in the HTML according to the TEI model.

Matching Words. After the internal representation has been built, the algorithm
matches the words between the TEI and HTML representations of a textbook.
The difficulty comes from the fact that the HTML produced by pdf2htmlEX
does not preserve the structure of words. This is also the most important part
since the matching of lines and paragraphs depends on this initial matching.
For each page, words are matched separately. First, the entire text of a page
is extracted separately from the TEI and HTML representations. Then, both
texts are compared using the Google’s Diff Match Patch library13 . This library
compares the words and determines which words in the HTML belong to which
words in the TEI representation. The result is a list containing differences and
matches between the text of both pages. Finally, for each matched TEI word,
the HTML is updated to wrap the matching TextNodes as a DOM element with
the ID corresponding to the matched word.
    There are some special cases when matching words. The first one is when
there is no one-to-one relation between the words from the TEI and HTML rep-
resentations. When a word in the TEI model corresponds to multiple elements
in the HTML, we can match the words by aggregating corresponding elements
in the HTML and giving them the same ID. The opposite case is more complex.
When an individual element in the HTML corresponds to multiple words in the
TEI, the TextNodes for the HTML element are to be split based on the words
in the TEI. Then, the TextNodes that contain the characters for each TEI word
are wrapped together with the ID of the respective word. Another case occurs
when the pdf2htmlEX library inserts special characters that are different from
the characters in the TEI representation. For example, sometimes the HTML
contains orthographic ligatures, rather than Latin characters. The TEI repre-
sentation contains no ligatures; therefore, in a preprocessing step, we replace
13
     https://github.com/google/diff-match-patch
8      I. Alpizar-Chacon et al.

some special Unicode characters by their correct counterparts. The replacement
list is stored is a separate file; we extend it as we observe new cases.

Matching Lines and Paragraphs. After the words have been matched, the al-
gorithm starts matching the lines and paragraphs. This process is quite simple
using the already matched words. For each line in the TEI representation, the
algorithm finds the corresponding words of that line in the HTML, gets their
parent element, and wraps it with the right ID. To further improve accuracy, a
cross-check is carried out after the line matching. If an HTML line did not get
a match, but if the previous and next line did and they have the same ID, the
algorithm also wraps the current line with the same ID. A similar process is done
to match paragraphs, but in this case, the algorithm keeps track of matched lines
in the HTML to detect and wrap the elements that represent paragraphs.

3.3   Web interface
The web interface of the Intextbooks system will allow students to engage and
interact with the textbooks in several ways. Currently, we are working on the
design and development of this and the other components of the system, such as
the student model. Figure 2 presents the working prototype of the web interface.


                    Fig. 2. Intextbooks web interface prototype


   The web interface is divided into three parts. The main one is shown in the
middle; this is where the textbook is displayed. There are smaller panel on either
    Transformation of PDF Textbooks into Intelligent Educational Resources         9

side of the main part. They provide additional tools for the user to interact with
the textbook. Two buttons on the top of the screen can be used to navigate to
the main menu and user settings.
    The web interface provides a TOC to the reader (panel A in Figure 2). The
TOC contains a reference to each (sub)section. The user can click on one of
the entries, and the web interface will show to the corresponding content of the
section the middle panel. Furthermore, when the teacher sets a path for a specific
textbook or an adapted path is generated for a user, this will be reflected in the
TOC. The TOC entry that are not included in the path will be crossed out
and have a lighter color. Lastly, the TOC displays annotations in the form of
checkmarks and a progress bar to provide navigational cues to the user.
    The other feature in the left part of the web interface is the search tool (panel
B). When searching a word, the web interface will suggest possible matches prior-
itizing important terms from the textbook model. The number of matches is also
included in the suggestions. After a search, the web interface will show snippets
of text where a keyword occurs. The user can make a more precise decision based
on these snippets. When clicked on a snippet, the web interface will browse to
the corresponding item in the textbook and highlight the corresponding searched
term.
    An important aspect of creating an intelligent textbook is having interaction
with the textbook. The web interface provides interaction by allowing the user to
click on certain words that are highlighted in the text. When the user left-clicks
on a highlighted word, additional information will be shown in a panel on the
right side of the web interface. The additional information can contain defini-
tions, links, or references to related chapters. This information also comes from
the semantic model of the textbook. Panel D in Figure 2 shows the additional
information for the histogram term.
    Other features are grouped under the same panel using tabs (panel E in Fig-
ure 2). When clicked on either tab, the web interface will show the corresponding
tool. The different tools can be enabled/disabled using the settings button. For
the moment, we define four tools, but more can be added as necessary. The first
tool is used to interact with other students or teachers. Questions can be asked
here to be answered by other students or teachers. The second tool is used to
highlight text in the textbook. Different colors can be used to categorize high-
lighted text. The third tool is used to create bookmarks. Bookmarks can be used
to improve navigation among different pages. Lastly, the user can create notes
in a simple text editor, similar to the text editor that most operating systems
provide.
    The most important component of an textbook to interact with is the its ac-
tual content. The textbook is displayed in the middle part of the interface, along
with extra options at the bottom (panel C). The user can zoom in and out, skip
to the next chapter, download the content as a PDF-file, or bookmark the current
page. The user can also right-click on the highlighted terms to display a context
menu with additional actions. As our system grows, this part of the interface can
provide multiple ways for the users to interact with the textbooks. For example,
10      I. Alpizar-Chacon et al.

tables could become interactive elements where the user can change the order
of the data or create aggregations. Interaction with fine-grained elements, such
as words, is possible thanks to the creation of the identifiable DOM structure in
each HTML representation. It is also worth mentioning that the web interface
is being developed for both desktop and smartphone displays.

Monitoring Engine The web interface communicates directly to the monitor-
ing engine to log every action of a student. All the logs are store in a activity
log repository.

Adaptation engine and student model These components are planned to
be added in the future as the Intextbooks system matures and starts utilizing
the domain extracted from the textbooks and the activity of students.


4     Validation
We have conducted a validation experiment to test the accuracy of the matching
algorithm, which is used to create the fine-grained identifiable DOM structure
in the HTML textbooks. This DOM structure coupled with the extracted TEI
model is required to offer the capability of fine-grained, flexible, semantically-
enriched interactions with textbooks.

4.1   Procedure
We have used a test set of 70 university-level textbooks in several domains: statis-
tics, computer science, web programming, literature, and history. All textbooks
are written in English. To estimate the accuracy of the matching algorithm,
we have used the percentage of words that were matched between the TEI and
HTML representations of each book as our evaluation metric. Such a metric is
a good indication for the accuracy of the algorithm since the word matching is
the most challenging part, and the line and paragraph matching depend on the
number of matched words.
    We have run the validation using thee variations of the matching algorithm.
Currently, the matching of words is not 100% correct, because sometimes the
order of words in the generated HTML is different from the one in the extracted
TEI model, and because subscripts and superscripts in the HTML are not always
placed in the correct position. Therefore, we have implemented a version of the
matching algorithm that uses a threshold to merge superscripts and subscripts
with either the previous or next line to try to increase matching accuracy. This
variant of the algorithm also sorts the words in the HTML based on their Y-
position, so they are read in the same order as in the TEI representation. For the
validation, we have tested the accuracy of the original matching algorithm (no
threshold), an algorithm with a fixed threshold, and a variant with a dynamic
threshold based on the most frequent distance between two lines on a page. Table
1 shows the results for the validation using the test set and the three algorithms.
      Transformation of PDF Textbooks into Intelligent Educational Resources   11


                      No Threshold      Fixed Threshold      Dynamic Threshold
Mean                   87.16                88.76                    87.09
Median                 90.53                91.15                    88.63
Standard Deviation     12.45                12.22                    13.15
                Table 1. Validation of the matching algorithms.


4.2     Analysis

The results show that at least 87% of all the words are matched for all three
methods. The obtained values indicate that the matching algorithm requires
some adjustments to match the remainder of the words.
    Textbooks that mostly consist of text (without other elements such as for-
mulae and tables) get a near 100% matching rate. However, the obtained values
are mostly determined by more complex textbooks. Such textbooks get a higher
mismatching rate because of the figures, tables, graphs, and other elements that
are represented as text in the TEI model. However, they are converted to images
by the pdf2htmlEX library. As a result, these elements reduce the matching rate.
For example, one of the textbooks [10], has a mismatch rate of about 15% due to
such discrepancies. Subscripts and superscripts in text also considerably reduce
matching rates. Using only the distance between the previous and next lines
does not provide a reliable indicator for whether an element is a superscript or a
subscript. A possible solution is to look at the word that gets formed by adding
the subscript/superscript to both the previous and next line, and see which of
these two words occurs in the TEI for that page. If the threshold algorithm is
further extended and improved so that the HTML and TEI representations are
read in the same order, we should be able to get an accuracy approaching the
100% mark.


5     Discussion and future work

We have presented the current state of Intextbooks - a system capable of trans-
forming PDF textbooks into interactive and intelligent educational resources.
Currently, we can extract high-quality semantic models of the textbooks, and
create HTML representations of the same textbooks that are connected to their
semantic models using fine-grained DOM structures. We can match around 88%
of all the words in the textbooks to individual elements in the HTML resources,
and are working to further improve the matching algorithm. We have designed
a web interface that allows to interact with the HTML textbooks in multiple
ways. We have plans to extend it with adaptive functionality.
    There might be several possible concerns both practical and pragmatic about
the scale and applicability of this kind of a system. One question that needs
to be addressed is the availability of textbooks and the copyrights issues. In
our experience, university libraries can supply enough PDF-based textbooks
12      I. Alpizar-Chacon et al.

on a variety of subjects. From the point of copyright protection, if a system
provides enhanced access to these books but only to the students of the university
holding necessary subscriptions, then publishers do not have a reason to object.
In the worst case scenario, many good-quality textbooks are freely available
online nowadays in open repositories such as Openstax14 /Connections15 , Open
Textbook Library16 , OER-Commons17 , etc.
    We plan to extend further and improve the components of the system. In
particular, it is important to better define the semantics of knowledge that is
extracted from the textbooks. Using the index terms from the textbooks to pre-
cisely identify the sub-domains that are covered in each (sub)section will allow
to recommend content to students better. Additionally, we need to test how the
automated textbook modeling technology will cope with different textbook for-
matting across less formal domains (e.g., sociology) or domains with conflicting
viewpoints (e.g., history).
    For this system to be as complete as we want, we need to add the miss-
ing components to infer the current state of students’ knowledge and provide
meaningful adaptation. Another direction for future work is to research on how
to produce fully interactive HTML elements from the static tables, charts, and
formulae in the textbooks. Finally, it is important to evaluate the system’s ef-
fectiveness in a user study with real students from a target group. Both the user
experience and the possible effect on learning need to be investigated.


References

 1. Alpizar-Chacon, I., Sosnovsky, S.: Expanding the web of knowledge: one textbook
    at a time. In: Proceedings of the 30th ACM Conference on Hypertext and Social
    Media. ACM, New York, NY, USA (2019)
 2. Alpizar-Chacon, I., Sosnovsky, S.: Interlingua: Linking textbooks across differ-
    ent languages. In: Proceedings of the First Workshop on Intelligent Textbooks.
    vol. 2384, pp. 104–117. CEUR-WS (2019)
 3. Alpizar-Chacon, I., Sosnovsky, S.: Order out of chaos: Construction of knowledge
    models from pdf textbooks. In: Proceedings of the 20th ACM Symposium on Doc-
    ument Engineering (Accepted). DocEng 2020, ACM, New York, NY, USA (2020)
 4. Brusilovsky, P.: Adaptive navigation support. In: The adaptive web, pp. 263–290.
    Springer (2007)
 5. Brusilovsky, P., Eklund, J.: A study of user model based link annotation in educa-
    tional hypermedia. Journal of Universal Computer Science 4(4), 429–448 (1998)
 6. Chao, H., Fan, J.: Layout and Content Extraction for PDF Documents. In: Docu-
    ment Analysis Systems VI, vol. 3163, pp. 213–224 (2004)
 7. Chaudhri, V.K., Cheng, B., Overtholtzer, A., Roschelle, J., Spaulding, A., Clark,
    P., Greaves, M., Gunning, D.: Inquire biology: A textbook that answers questions.
    AI Magazine 34(3), 55–72 (2013)
14
   https://openstax.org
15
   https://cnx.org/
16
   https://open.umn.edu/opentextbooks
17
   https://www.oercommons.org/
    Transformation of PDF Textbooks into Intelligent Educational Resources          13

 8. Chiu, T.K.F.: Introducing electronic textbooks as daily-use technology
    in schools: A top-down adoption process. British Journal of Educa-
    tional Technology 48(2), 524–537 (2017). https://doi.org/10.1111/bjet.12432,
    https://onlinelibrary.wiley.com/doi/abs/10.1111/bjet.12432
 9. De Bra, P.M.: Teaching through adaptive hypertext on the www. International
    Journal of Educational Telecommunications 3(2), 163–179 (1997)
10. Dekking, F.M., Kraaikamp, C., P., L.H., Meester, L.E.: A modern introduction to
    probability and statistics: understanding why and how. Springer (2005)
11. DeNoyelles, A., Raible, J.: Exploring the use of e-textbooks in higher edu-
    cation: A multiyear study. Educause Review. Retrieved from https://er. edu-
    cause. edu/articles/2017/10/exploring-the-use-of-e-textbooks-in-higher-education-
    a-multiyear-study (2017)
12. Di Vesta, F.J., Gray, G.S.: Listening and note taking. Journal of educational psy-
    chology 63(1), 8 (1972)
13. Dichev, C., Dicheva, D.: View-based semantic search and browsing. In: 2006
    IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main
    Conference Proceedings)(WI’06). pp. 919–925. IEEE (2006)
14. Ericson, B.: An analysis of interactive feature use in two ebooks pp. 1–14 (2019)
15. Fisher, J.L., Harris, M.B.: Effect of note taking and review on recall. Journal of
    Educational Psychology 65(3), 321 (1973)
16. Gall, M.D.: The use of questions in teaching. Review of educational research 40(5),
    707–721 (1970)
17. Glushko, R.J.: The discipline of organizing. BULLETIN of the American society
    for information science and technology 40(1), 21–27 (2013)
18. Kavcic, A.: Fuzzy user modeling for adaptation in educational hypermedia. IEEE
    Transactions on Systems, Man, and Cybernetics, Part C (Applications and Re-
    views) 34(4), 439–449 (2004)
19. Kissinger, J.: The social & mobile learning experiences of students using mobile
    e-books. Online Learning 17(1) (2013). https://doi.org/10.24059/olj.v17i1.303,
    https://olj.onlinelearningconsortium.org/index.php/olj/article/view/303
20. Marinai, S., Marino, E., Soda, G.: Conversion of pdf books in epub format. In:
    2011 International Conference on Document Analysis and Recognition. pp. 478–
    482. IEEE (2011)
21. Melis, E., Goguadze, G., Homik, M., Libbrecht, P., Ullrich, C., Winterstein, S.:
    Semantic-aware components and services of activemath. British Journal of Educa-
    tional Technology 37(3), 405–423 (2006)
22. Rahman, F., Alam, H.: Conversion of pdf documents into html: a case study of
    document image analysis. In: The Thrity-Seventh Asilomar Conference on Signals,
    Systems & Computers, 2003. vol. 1, pp. 87–91. IEEE (2003)
23. Ritter, S., F.J.L.A.B.F.S.H.B..F.S.: What’s a textbook? envisioning the 21st cen-
    tury k-12 pp. 87–94 (2019)
24. Sosnovsky, S., D.M.A.E.G.G.W.S.L.P.S.J..M.E.: Math-bridge: Closing gaps in eu-
    ropean remedial mathematics with technology-enhanced learning. In: Mit werkzeu-
    gen mathematik Und stochastik lernen–using tools for learning mathematics and
    statistics, pp. 437–451. Springer (2014)
25. Ullrich, C., Melis, E.: Pedagogically founded courseware generation based on htn-
    planning. Expert Systems with Applications 36(5), 9319–9332 (2009)