Signature-based manual dating vs. neural network
                         automation⋆
                         Tea Tvalavadze1,†, Ia Ghadua1,†, Giorgi Kalandadze2,† and Maksim Iavich3,*,†
                         1
                           Giorgi Leonidze State Museum of Georgian Literature, 36 Petre Kavtaradze str., Tbilisi, 0186, Georgia
                         2
                           Association for Textual and Editorial Studies and Digital Humanities, 17 Sakhalkho str., Tbilisi, 0113, Georgia
                         3
                           Caucasus University, School of Technology, 1 Paata Saakadze str., Tbilisi, 0102, Georgia


                                             Abstract
                                             Dating manuscripts is a multifaceted task that necessitates integrating various analytical methods to
                                             establish historical context and authenticate documents. This paper compares two methods for dating
                                             manuscripts of Galaktion Tabidze, a notable Georgian poet. We utilized both a manual signature analysis
                                             and an automated Convolutional Neural Network (CNN) approach to date undated manuscripts from
                                             Tabidze’s archive. The manual signature method relied on analyzing specific graphematic features associated
                                             with different periods of Tabidze’s work. This approach provided clear and consistent dating results for
                                             manuscripts. The CNN method, on the other hand, used probabilistic estimates to suggest dates. While the
                                             CNN method generally supported the manual findings, it also introduced some uncertainties. For instance,
                                             the CNN method suggested certain dates that did not align with the manual analysis, such as late 20th-century
                                             dates for manuscripts that the manual method dated to earlier periods. The comparison highlighted that the
                                             manual signature method offered more reliable and precise dating, especially for earlier works. The CNN
                                             method, while valuable, introduced variability and indicated areas where the model’s accuracy could be
                                             improved. This study demonstrates that while both methods have their strengths, the manual approach
                                             provides a more consistent basis for dating manuscripts, whereas the CNN method serves as a
                                             complementary tool with potential for further refinement.

                                             Keywords
                                             manuscript dating, neural networks, manual dating, automatic dating1


                                1. Introduction                                                      known works, also plays a role [3–6]. Quotations or
                                                                                                     influences from texts with established dates can assist in
                                Dating manuscripts is a complex and multifaceted task that           dating the manuscript. For manuscripts on organic
                                requires careful analysis and the integration of various             materials like paper or parchment, carbon dating can
                                methods. This process is essential for understanding a               estimate the age of the material. Although this provides a
                                manuscript’s historical context and verifying its                    date range, it may not pinpoint the exact date of the
                                authenticity. Several approaches are used: Examining                 manuscript’s creation. The manuscript’s provenance, or its
                                physical characteristics such as paper, ink, and binding             history of ownership, can offer additional dating
                                materials can provide clues about the manuscript’s age.              information. Inscriptions, ownership marks, or historical
                                Specific types of paper and ink, along with features like            records related to previous owners can provide valuable
                                watermarks and script styles, can often be linked to                 clues [7, 8].
                                particular periods or regions [1, 2]. Paleography, or the                In the initial phase of the graphematic analysis of the
                                study of ancient handwriting styles, is another critical             manuscripts of Galaktion Tabidze, the preeminent
                                method. By analyzing the script, scholars can identify               Georgian poet of the 20th century (1891–1959), the research
                                changes in handwriting over time, which helps in                     team aimed to date the manuscripts preserved in his
                                establishing the manuscript’s timeframe. Historical                  archive. The team selected 2–3 dated manuscripts from
                                references within the manuscript—such as mentions of                 each year between 1905 and 1959. They deconstructed the
                                events, figures, or other works—can offer dating clues. If           scanned images, compiled databases of graphemes and
                                the manuscript refers to specific historical events or               graphemic pairs, and identified the most informative
                                individuals with known dates, this information can help              element types for dating purposes, subsequently coding
                                narrow down its creation period. Textual analysis, which             these elements [9, 10]. The database of undated
                                involves comparing the manuscript’s content with other               manuscripts was then processed using the same principles,


                         CSDP-2024: Cyber Security and Data Protection, June 30, 2024, Lviv,               0000-0003-3742-6825 (T. Tvalavadze); 0000-0002-7434-2519
                         Ukraine                                                                       (I. Ghadua); 0009-0002-3269-0133 (G. Kalandadze); 0000-0002-3109-7971
                         ∗ Corresponding author.                                                       (M. Iavich)
                         †
                           These authors contributed equally.
                                                                                                                     © 2024 Copyright for this paper by its authors. Use permitted under
                            teatvalavadze@gmail.com (T. Tvalavadze); ghaduaia@gmail.com                              Creative Commons License Attribution 4.0 International (CC BY 4.0).
                         (I. Ghadua); gkalanda@hotmail.com (G. Kalandadze);
                         miavich@cu.edu.ge (M. Iavich)

CEUR
Workshop
                  ceur-ws.org
              ISSN 1613-0073
                                                                                               123
Proceedings
with an attempt to date them based on specific graphemic              However, they can contribute significantly to achieving
features identified over the years. The dating of the test            accurate results within a comprehensive research
manuscripts revealed that the predominant presence of the             framework. For instance, in cases where multiple potential
622 graphemic types across all periods of Tabidze’s work              dates have been identified through other methods, these
hindered precise dating, thus impacting the overall results           intervals can help us choose the most likely date.
[11].                                                                      For example, let’s consider the case of one poem by
    Given that a single comparative analysis of all the               Galaktion Tabidze, which he published in 1933 under the
features of all graphemes did not yield significant results,          title “The First of May” with the inscription “Poem
we decided to refine our approach by focusing on specific             delivered at an illegal evening in 1908, on the first of May”.
types of elements that were either consistently used or               The date 1908 is written not only on the publication but also
distinctly absent during particular periods. To achieve this,         on the autographs. The fact is that Galaktion regularly
we conducted a detailed examination of each of the 622                published his poems, and, unless some external
graphematic types identified in our codebook. Our findings            circumstances prevented it, nothing remained unpublished.
indicated that most types appeared in manuscripts from                Given the social democrats’ rise to power in Georgia in 1918
specific years, but not in adjacent years, and then                   and the subsequent annexation by Soviet Russia in 1921, it
reappeared after a gap of one or two years. This pattern              is clear that there would have been no obstacles to
likely resulted from random variation rather than date-               publishing this poem from 1918 onwards. Consequently,
informative differences between manuscripts from these                there is suspicion that the poet may have attributed a false
years. Nonetheless, the extended periods of use or non-use            date to the poem to construct an image of his “revolutionary
of specific types revealed through this study could provide           past”. Considering the circumstances of writers under
a robust basis for more accurate dating. In this paper we             Stalin’s totalitarian regime, this is not surprising. While no
present two methods of dating the manuscripts, the first              opposition has arisen since the collapse of the Soviet Union
one, the manual using the signatures of the authors, and the          to the notion that Galaktion might have falsified the date,
second one using the Neural Networks.                                 definitive proof confirming the falsification of this poem’s
                                                                      date by the author has not yet emerged.
2. Dating the manuscripts using                                            Five of the six autographs of the poem are dated: two
                                                                      indicate 1905, two indicate 1908, and one, at the end of the
   signature methods                                                  poem, is dated April 26, 1933. Such variability in the dating
As previously mentioned, the majority of the graphemic                by the author naturally strengthened the suspicion of its
types we identified were present in manuscripts from at               mystification. However, graphematic research provided an
least one or two years across each decade of the poet’s               opportunity to substantiate this assumption. Since three
activity: the 1910s, 1920s, 1930s, 1940s, and 1950s.                  autographs of the poem (MGL, N4763, N5381, N5552) are
Consequently, relying solely on these bases made it difficult         corrected so much that they reflect the process of creating
to accurately date the texts. However, we encountered                 the poem rather than merely “copying” it, it was evident that
several exceptions. For instance, the analysis of dated               determining the creation time of these manuscripts through
manuscripts revealed that the element type >გ<2/2] was                graphemic analysis would aid in pinpointing the poem’s
used by the poet exclusively from April 1908 to December              actual date of composition. In this instance, our task was
1910. This discovery allowed us to date dozens of undated             relatively straightforward: we needed to select one of the
manuscripts that utilized this specific type.                         three possible dates inscribed on the manuscripts. Given that
    During the compilation of the codebook, we noted the              the handwriting from 1905–1908 shares all the characteristics
presence of various graphemic types in the manuscripts of             required for our common research parameters, our choice
each year but did not record the percentage relationship of           was effective between these dates and 1933. Here, there were
each type with other types of the same element.                       numerous distinguishing features to consider.
Consequently, negative statistics—identifying which types                  Until 1911, no type of the double-arched “დ” (>დ<3/) is
were absent in specific years—proved more fruitful than               found in the Galaktion’s manuscripts. The types of the
positive statistics in our research. In other words,                  double-arched “რ” (>რ<4/) and the upper additional line of
understanding which types were not found in certain years             “ლ” (>ლ<6/) do not appear until 1909, the type of grapheme
provided more valuable insights. Given that the earliest
                                                                      “შ” (>შ<2/7))—before 1910, type of “წ” (>წ<1/9]) until 1912,
extant manuscript of Galaktion dates from 1905 and the
latest from 1959, element types absent from 1905–1915                 type of “პ” (>პ<3/4])—until 1915, etc. Therefore, the presence
suggest these elements began to be used in 1916, allowing             of all these elements in the manuscripts of the poem “The
us to date manuscripts containing these types to periods              First of May”, and in large numbers, is a factual confirmation
post-1915. Similarly, element types present between 1949              that they were created in 1933, and not in 1905 or 1908. See
and 1959, but absent in earlier periods, indicate that the            the Fig. 1.
poet ceased using these types after 1949, enabling us to date
texts with these types to before 1949. Naturally, a
conclusive determination cannot be based on the presence
or absence of a single grapheme. Therefore, we also
conducted studies on other graphemes to confirm their
compatibility with the estimated periods we identified.               Figure 1: Elements set
As previously mentioned, extended time intervals
                                                                      The primary basis for dating manuscripts through
indicating the use or non-use of specific types of elements
                                                                      graphematic research lies in the systematic variation of
cannot independently solve the problem of dating a text.
                                                                      outlines over the years. People, especially writers, often
                                                                124
modify the outlines of individual graphemes and their tied             of outlines that characterized the author’s facsimiles during
pairs. However, as our experience indicates, they tend to              specific periods were particularly valuable for dating.
focus more extensively on their facsimiles, which are                  Conversely, those that appeared almost every year or only
directly linked to their individuality. Consequently, at the           once or twice were deemed ineffective for this purpose.
next stage of our graphematic research, we decided to                       Galaktion began publishing poems at the age of 17, in
observe and analyze the facsimiles of Galaktion Tabidze.               1908, and his exceptional creative potential became
     From one of the poet’s recollections, we learn about his          immediately apparent. As he was an aesthete by nature, it
keen observation of his singing teacher’s signing process at           was very important for him what kind of facsimile would
the Kutaisi theological school: “Sharabi-dze: for this “dze”           appear under the autographs of his poems. Therefore, he
he would draw a fast first line, then he would turn it with a          invested substantial effort into perfecting it. His notebooks
second line, then he would add a third line. These were                reveal numerous facsimiles written consecutively and
musical score lines, and on these lines, he would draw the             reflect the meticulous process of working on and refining
clef so quickly and beautifully that I was amazed...” It               his facsimile. In one case, his signature took the shape of a
appears that he admired the teacher’s signature style so               ship and in another, it had the contours of an ornament. The
much that he developed a similar signature himself.                    graphemes were alternately enlarged, elongated, or angular,
Initially, he depicted the violin key horizontally in the              and sometimes the initials of his name and surname were
lower part of the signature (D-273) and later began to shape           either combined or intricately inserted one into the other. See
the initials of his name and surname into a vertical violin            Fig. 4.
key. He greatly appreciated when the initials of the name
and surname, or ideally, their syllables, were repeated. See
Fig. 2.


                                                                       Figure 4: Signature with initials

                                                                       The signatures preserved in the poet’s archive exhibit
                                                                       various compositional forms: the full first and last name,
Figure 2: Signature basic
                                                                       the first name abbreviated (“Gal.”) with the full last name,
Several pages are filled with the Russian signatures of                the first name initial with the last name abbreviated in the
Ieronim Yasinsky, in which he first altered the initial of the         middle (“G. T-dze”), first and last name initials (“G. T.”), the
name to a letter similar to the initial of the surname, then           first name only (“Galaktion”), the abbreviated first name
changed the surname ending to a feminine one, included                 only (“G.”), and the first name initial with the final letters
another syllable containing the initial in the surname itself,         of the last name unidentifiable due to the rapid writing
and invented a name that would begin with the same                     style. Among these, the latter type, the typical automated
syllable as the surname. Observing these practices aided us            facsimile, demonstrates the most variations over the years.
in analyzing the variations he introduced into his facsimile.          This type aims to indicate authorship rather than to
See Fig. 3.                                                            perfectly represent all graphemes.
                                                                            Galaktion appears to have been particularly fond of
                                                                       incorporating symmetrical elements into his facsimiles. As
                                                                       previously discussed, he created a symmetrical signature for
                                                                       the surname Yasinsky. Regarding his facsimiles, starting in
                                                                       1907, he began adding bold horizontal lines to the initials of
                                                                       his first and last names to emphasize symmetry. During
                                                                       1908–1909, he sometimes combined these two horizontal
Figure 3: Russian signature                                            lines into one. Please see Fig. 5.

Galaktion Tabidze was highly sensitive to the issue of
authorship, experiencing great distress even when others
appropriated individual rhymes and not the entire poems.
Consequently, it became his custom to sign each poem at                Figure 5: Signature with symmetry
the end, even when he wrote dozens of poems in a single
notebook. This practice provided us with a wealth of                   By 1910, he started to write in bold the upper and lower
material for graphematic research. We began by collecting              parts of the vertical “curly” element, appended to the right
all facsimiles from the poet’s extensive archival units,               side of the facsimile so that they were symmetric to the
sorting the facsimiles of dated manuscripts by year, and               horizontal lines, added to the letter “g” or “g” and “t”. Such
creating a separate database for updated ones. Our research            facsimiles can be found up to and including 1915. Please see
commenced with the study of dated facsimiles, identifying              Fig. 6.
the constituent elements of each facsimile and categorizing
the types within these elements. In this analysis, the types


                                                                 125
Figure 6: Signature with bold characters

Galaktion’s desire to incorporate symmetry into his                     3819. As a result, we were able to date 22 verses, 1 poem, 1
facsimiles persisted even beyond 1915. However, starting in             play, 14 diaries, and 2 personal letters belonging to
1916, he abandoned the use of vertical “curl” and instead               Galaktion Tabidze. Although in some cases we could only
introduced symmetry by accentuating the individual arc                  determine the lower limit of the time interval, this is still
forms of the graphemes in the facsimile. For instance, the              significant. Without at least an approximate date, it would
arc opening to the right of the “t” and the arc opening to the          be impossible to include these works in the bio-bibliography
left of the “e” framed the facsimile. When the facsimile                and place them accurately within the author’s biographical
featured the initial of his name, this mirror symmetry was              context.
created by the arc of the “g” rather than the “t”. See Fig. 7.
                                                                        3. Neural networks for manuscript
                                                                           dating
                                                                        Neural networks have emerged as powerful tools in the
                                                                        analysis of historical manuscripts, offering innovative
                                                                        methods for extracting and interpreting various attributes
                                                                        of these documents. By leveraging advanced machine
                                                                        learning techniques, researchers can gain new insights into
                                                                        the content, structure, and material characteristics of
Figure 7: Signature with accentuating characters                        manuscripts. The process begins with the collection of high-
                                                                        resolution images of manuscript pages, capturing detailed
From 1917 onwards, the poet began writing his full name,                information about the text, handwriting, paper quality, and
“Galaktion”, and emphasized symmetry by darkening the                   ink composition. Annotated datasets, comprising
additional line of the “l”, the horizontal line of the “o,” and         manuscripts with known attributes, serve as a foundation
sometimes the upper part of the “n”. See Fig. 8.                        for training neural networks. These datasets provide the
                                                                        necessary ground truth for the network to learn from and
                                                                        refine its analysis capabilities [12–15].
                                                                            Neural networks, particularly Convolutional Neural
                                                                        Networks (CNNs), excel in feature extraction from image
                                                                        data. For text analysis, Optical Character Recognition (OCR)
Figure 8: Signature with full name.                                     is employed to convert handwritten text into machine-
                                                                        readable formats, enabling the network to identify and
Accordingly, the comparative study of the chronological
                                                                        analyze text patterns and stylistic elements. In addition to
database of Galaktion’s facsimiles revealed four distinct
                                                                        text, neural networks can examine handwriting styles,
time intervals characterized by specific stylistic elements.
                                                                        detecting variations and trends that may indicate authorship
From 1907 to 1910, he utilized the symmetry of the letters
                                                                        or stylistic changes over time [16–19]. They also assess the
“g” and “t” by adding horizontal lines to them. Between 1910
                                                                        physical attributes of manuscripts, such as paper texture,
and 1915, he enhanced this symmetry by darkening the top,
                                                                        ink fading, and unique markings, which provide insights
bottom, or both top and bottom lines of a vertical “curl”.
                                                                        into the materials and methods used during the
Starting in 1916, he emphasized the left part of his first name
                                                                        manuscript’s production. Training the neural network
initial and the outlines “t”, and “e” in his last name. Since
                                                                        involves using these annotated datasets to help the model
1917, he introduced symmetry through the upper part of the
                                                                        recognize and learn patterns and features associated with
italicized “g”, the additional line of the “l”, the horizontal
                                                                        different manuscript attributes. Advanced models, including
line of the angular “o”, and the upper arc of the “n”.
                                                                        Recurrent Neural Networks (RNNs) or Transformers, can be
     Facsimiles similar to the aforementioned first feature,
                                                                        used for sequential text analysis, while transfer learning
identified through the research of the dated text facsimiles,
                                                                        allows for the use of pre-trained models to enhance
were not found in the updated database. However, based on
                                                                        accuracy.
the discovery of the second feature, we dated a number of
                                                                            Once trained, the neural network can be applied to new
the archival units to the years 1910–1915, specifically: MGL:
                                                                        manuscripts to perform detailed analyses. It can identify
45, 416, 650, 651, 1330, 1319–1327, 1361, and 2176. The third
                                                                        patterns, detect subtle variations, and extract meaningful
feature allowed us to date archival units to the period after
                                                                        features from both the text and physical attributes of the
1916, including MGL 471–19, 507–2, 527–6, 603–12, 618–1,
                                                                        manuscript. This capability extends to recognizing specific
637–8, 655–2, 1378, and 24551–242. The fourth feature
                                                                        types of paper or ink, assessing text style, and detecting
indicated a period after 1917 for the following archival units:
                                                                        unique annotations or markings. Validation of the neural
MGL 420–8, 488–1, 496–3, 638–10, 1378, 1678, 1718, and


                                                                  126
network’s predictions involves cross-referencing with                   robustness and generalization, as it exposes the model to
known samples and expert evaluations to ensure accuracy                 diverse perspectives and orientations of manuscript images.
and reliability. Integrating the outputs of neural networks                  The CNN model architecture is defined sequentially
with expert knowledge is crucial, as it provides a                      using Keras, starting with a `Rescaling` layer to normalize
comprehensive understanding of the findings and their                   pixel values between 0 and 1. Normalization ensures
historical context.                                                     consistency in input data, facilitating efficient model
    The advantages of using neural networks in manuscript               convergence during training. The subsequent `Sequential`
analysis include their ability to provide objective, data-              container encapsulates layers responsible for feature
driven insights and efficiently process large volumes of data.          extraction and classification.
They are capable of identifying intricate patterns and                       The model incorporates three convolutional layers
features that might elude traditional methods and can be                (`Conv2D`), each followed by a `MaxPooling2D` layer.
scaled to analyze manuscripts from diverse historical                   Convolutional layers apply a set of filters to extract
periods and regions. However, there are challenges to                   hierarchical features from input images, while max-pooling
consider. The quality of the training data is critical, and             layers downsample feature maps, reducing computational
obtaining high-quality, annotated datasets can be demanding.            complexity and focusing on prominent features. These
Manuscripts often exhibit complex and overlapping features,             operations enable the model to learn spatial hierarchies and
which require advanced neural network architectures to                  abstract representations inherent in manuscript images.
model effectively. Additionally, neural network findings                     Batch normalization layers (`BatchNormalization`) are
should be validated with expert input to ensure that                    interspersed between convolutional and activation layers,
interpretations align with historical knowledge. Looking                stabilizing training by normalizing activations and
forward, integrating neural networks with other analytical              accelerating convergence. This technique enhances model
techniques, such as chemical analysis or historical records,            training efficiency and robustness to variations in input data.
can further enhance manuscript analysis. Expanding datasets                  Following convolutional operations, feature maps are
to cover a broader range of historical contexts will improve            flattened (`Flatten`), converting multi-dimensional tensors
model generalization, and ongoing advancements in neural                into one-dimensional vectors suitable for dense layers. Two
network technology promise to refine and enhance analytical             fully connected (`Dense`) layers with ReLU activation
capabilities.                                                           functions facilitate nonlinear mapping and feature
    Neural networks represent a significant advancement in              aggregation. The first dense layer employs L2 regularization
manuscript analysis, offering new methodologies for                     (`regularizers.l2`) to mitigate overfitting, penalizing large
understanding and authenticating historical documents.                  weights, and promoting model generalization.
Their application enables a more detailed examination of                     The output layer consists of `num_classes` units
text and physical attributes, contributing to a deeper                  corresponding to the number of historical periods in the
comprehension of manuscript origins and characteristics.                dataset. Utilizing softmax activation, the output layer
The next section describes the methodology of using neural              computes probabilities for each period, facilitating multi-
networks in manuscript dating.                                          class classification by assigning manuscripts to their most
                                                                        likely historical categories based on learned features.
4. The automatic method using                                                To optimize model parameters, the model is compiled
                                                                        with the Adam optimizer, known for its efficiency in
    neural networks                                                     stochastic optimization tasks. Sparse categorical cross-
The manuscript dataset resides in the good drive folder                 entropy serves as the loss function, appropriate for multi-
`/content/drive/MyDrive/photos`,           structured      into         class classification where each manuscript is assigned a
subdirectories corresponding to different historical periods.           single historical period label. Training commences using the
This organizational scheme allows TensorFlow’s                          `fit` method, iterating over a specified number of epochs (30
`ImageFolder` utility to efficiently load and categorize images         epochs in this case) to adjust model weights based on
based on their respective periods. By leveraging directory              training data (`train_ds`). Validation against separate
names as class labels, the dataset loading process is                   validation data (`val_ds`) assesses model performance on
streamlined, facilitating subsequent preprocessing steps.               unseen examples, preventing overfitting and validating its
    Upon loading, the dataset is split into training and                ability to generalize to new manuscripts. During training,
validation sets using `image_dataset_from_directory`. This              metrics such as accuracy and loss are monitored and
function partitions the dataset based on a specified                    visualized using matplotlib. Plots of training/validation
validation split (in this case, 5%), ensuring that a small              accuracy and loss across epochs provide insights into model
portion of data is reserved for model validation. Parameters            convergence and performance trends, aiding in the
such as image size and batch size are configured to                     assessment of model efficacy and identification of potential
standardize input dimensions (`img_height` and                          improvements.
`img_width` set to 180 pixels each) and optimize memory                      Upon completing training, the trained model is saved
usage during training.                                                  using `model.save`, storing the model’s architecture,
    Data augmentation is implemented using TensorFlow’s                 weights, and optimizer state on disk. This step ensures that
`RandomFlip` method, which introduces variations in                     the trained model can be reused and deployed for inference
training images by randomly flipping them horizontally and              on new manuscript images without the need for retraining.
vertically. This technique is crucial for enhancing model                    In addition to model saving, a zip archive containing
                                                                        model files (`mymodel.zip`) is created, enhancing


                                                                  127
portability and facilitating distribution for collaborative                train_ds     =      train_ds.map(lambda           x,     y:
research or deployment in digital archives.                            (normalization_layer(x), y))
    For inference on new, undated manuscripts, the saved                   val_ds      =       val_ds.map(lambda            x,       y:
model is loaded using `tf.keras.models.load_model`.                    (normalization_layer(x), y))
Manuscript images from a designated directory
(`drive/MyDrive/toCheck`) are loaded and preprocessed                      # Cache and prefetch datasets for performance
using TensorFlow’s image processing utilities. Each image                  AUTOTUNE = tf.data.AUTOTUNE
undergoes resizing (`target_size = (180, 180)`) and conversion             train_ds                                                  =
into a numerical format suitable for input to the model.               train_ds.cache().prefetch(buffer_size=AUTOTUNE)
    Inference is conducted using `model.predict`,                          val_ds                                                    =
generating predictions in the form of probabilities for each           val_ds.cache().prefetch(buffer_size=AUTOTUNE)
historical period. The top three predicted periods for each
manuscript are stored in a dictionary (`resultDict`),
facilitating further analysis and validation by historians and             # Step 2: Model Architecture
researchers. This approach enables automated dating and
categorization of undated manuscripts based on visual                      # Define CNN model architecture
content, leveraging machine learning to support historical                 model = tf.keras.Sequential([
research and analysis.                                                       tf.keras.layers.Rescaling(1./255),
                                                                             data_augmentation,
    The pseudo code of CNN:                                                  tf.keras.layers.Conv2D(32, (3, 3), activation=‘relu’),
                                                                             tf.keras.layers.MaxPooling2D((2, 2)),
    # Step 1: Data Handling and Preprocessing                                tf.keras.layers.Conv2D(64, (3, 3), activation=‘relu’),
                                                                             tf.keras.layers.MaxPooling2D((2, 2)),
    # Define data directory                                                  tf.keras.layers.Conv2D(128, (3, 3), activation=‘relu’),
    data_dir = ‘/content/drive/MyDrive/photos’                               tf.keras.layers.MaxPooling2D((2, 2)),
                                                                             tf.keras.layers.Flatten(),
    # Load dataset using ImageFolder                                         tf.keras.layers.Dense(128,              activation=‘relu’,
    builder = tfds.ImageFolder(data_dir)                               kernel_regularizer=regularizers.l2(0.001)),
    dataset = builder.as_dataset(shuffle_files=True)                         tf.keras.layers.Dense(num_classes) # num_classes is
                                                                       the number of historical periods
    # Split dataset into training and validation sets                      ])
    train_ds = tf.keras.utils.image_dataset_from_directory(
      data_dir,                                                            # Compile the model
      validation_split=0.05,                                               model.compile(
      subset="training",                                                    optimizer=‘adam’,
      seed=123,
      image_size=(180, 180),                                           loss=tf.keras.losses.SparseCategoricalCrossentropy(from_l
      batch_size=32                                                    ogits=True),
    )                                                                        metrics=[‘accuracy’]
                                                                           )
    val_ds = tf.keras.utils.image_dataset_from_directory(
      data_dir,
      validation_split=0.05,                                               # Step 3: Model Training and Evaluation
      subset="validation",
      seed=123,                                                            # Train the model
      image_size=(180, 180),                                               history = model.fit(
      batch_size=32                                                          train_ds,
    )                                                                        validation_data=val_ds,
                                                                             epochs=30
    # Apply data augmentation                                              )
    data_augmentation = keras.Sequential([
                                                                           # Step 4: Model Saving and Deployment
layers.experimental.preprocessing.RandomFlip("horizontal
_and_vertical"),                                                           # Save the model
    ])                                                                     model.save(‘path/to/save/model’)

    # Normalize pixel values                                               # Step 5: Inference on New Data
    normalization_layer = tf.keras.layers.Rescaling(1./255)
                                                                           # Load the saved model
    # Prepare dataset for training and validation                          model                                                     =
                                                                       tf.keras.models.load_model(‘path/to/saved/model’)


                                                                 128
                                                                                     1910-1915          ა-1327
    # Inference on new data                                                          1910-1915          ა-1361
    for file in os.listdir(‘drive/MyDrive/toCheck’):                                 1910-1915          ა-2176
      test_image                                       =                             1910-1915          ა-2176. 2
tf.keras.preprocessing.image.load_img(‘drive/MyDrive/toC                             1916 -             დ-471-19
heck/’ + file, target_size=(180, 180))                                               1916 -             დ-507-2
      test_image = image.img_to_array(test_image)                                    1916 -             დ-527-6
      test_image = np.expand_dims(test_image, axis=0)                                1916 -             დ-603-12
                                                                                     1916 -             დ-618-1
     # Predictions                                                                   1916 -             დ-637-8
     predictions = model.predict(test_image)                                         1916 -             დ-655-2
     top_three_predictions                                  =
                                                                                     1916 -             ა-1378-2
get_top_three_predictions(predictions)
                                                                                     1916 -             ხ-24551-242
                                                                                     1917-              დ-420
     # Store results
                                                                                     1917-              დ-488
     resultDict[file] = top_three_predictions
                                                                                     1917-              დ-496
                                                                                     1917-              დ-638
    # Utility function to get top three predictions
                                                                                     1917-              დ-1678
    def get_top_three_predictions(predictions):                                      1917-              დ-1718
     class_labels = [i for i in range(1907, 1959)]
     class_labels.remove(1916)                                         The automated CNN method gave us the following results,
     class_labels.remove(1917)                                         it the period column there are three probable answers:
     class_labels.remove(1918)
     class_labels.remove(1920)                                         Table 2
                                                                       Automotive CNN method
     # Convert predictions to list and find top three indices                     Period                Title
     predictions_list = predictions.tolist()[0]                                   [1912, 1914, 1956]    ა-45
     sorted_predictions       =       sorted(predictions_list,                    [1910, 1926, 1930]    ა-416
reverse=True)                                                                     [1950, 1911, 1934]    დ-650
     top_three_indices                                      =                     [1910, 1911, 1912]    დ-651-11
[predictions_list.index(sorted_predictions[i]) for i in                           [1911, 1910, 1958]    ა-1330
range(3)]                                                                         [1910, 1926, 1930]    ა-1319
                                                                                  [1912, 1913, 1911]    ა-1920
     # Map indices to class labels                                                [1911, 1926, 1950]    ა-1321
     top_three_labels = [class_labels[idx] for idx in                             [1940, 1912, 1950]    ა-1322
top_three_indices]                                                                [1955, 1957, 1940]    ა-1323
     return top_three_labels                                                      [1910, 1928, 1949]    ა-1324
                                                                                  [1912, 1925, 1910]    ა-1325
5. Experiments                                                                    [1915, 1927, 1938]    ა-1326
We have analyzed the set of undated manuscripts using                             [1910, 1926, 1908]    ა-1327
both methods described in the paper, the manual signature                         [1910, 1926, 1930]    ა-1361
method gave us the following results:                                             [1910, 1926, 1930]    ა-2176
                                                                                  [1915, 1943, 1912]    ა-2176. 2
Table 1                                                                           [1925, 1926, 1922]    დ-471-19
Signature method                                                                  [1958, 1950, 1914]    დ-507-2
               Period            Title                                            [1925, 1926, 1926]    დ-527-6
               1910-1915         ა-45                                             [1908, 19406, 1950]   დ-603-12
               1910-1915         ა-416                                            [1925, 1926, 1908]    დ-618-1
               1910-1915         დ-650                                            [1922, 1925, 1950]    დ-637-8
               1910-1915         დ-651-11                                         [1949, 1926, 1925]    დ-655-2
               1910-1915         ა-1330                                           [1950, 1955, 1910]    ა-1378-2
               1910-1915         ა-1319                                           [1940, 1955, 1908]    ხ-24551-242
               1910-1915         ა-1920                                           [1922, 1936, 1910]    დ-420
               1910-1915         ა-1321                                           [1940, 1941, 1955]    დ-488
               1910-1915         ა-1322                                           [1913, 1950, 1958]    დ-496
               1910-1915         ა-1323                                           [1925, 1921, 1956]    დ-638
               1910-1915         ა-1324                                           [1930, 1936, 1910]    დ-1678
               1910-1915         ა-1325                                           [1908, 1909, 1910]    დ-1718
               1910-1915         ა-1326                                           [1958, 1957, 1950]    3819


                                                                 129
6. Conclusions                                                              of Java, Madura, Bali and Lombok, Brill (2017) 405–
                                                                            441.
In evaluating manuscript dating, we compared results from            [8]    J. Droese, J. Karolewski, Manuscript Albums and Their
the manual signature method and the automated                               Cultural Contexts: Collectors, Objects, and Practices,
Convolutional Neural Network (CNN) method, focusing on                      De Gruyter (2024).
their alignment and discrepancies with probabilistic                 [9]    T. Tvalavadze, et al., Automated Dating of Galaktion
estimates from the CNN method. The manual signature                         Tabidze’s Handwritten Texts, Advances in Computer
method offers clear dating, and the CNN method aligns with                  Science for Engineering and Education VI, LNDECT
this, showing high probabilities for certain dates while                    181 (2023). doi: 10.1007/978-3-031-36118-0_23.
suggesting lower probabilities for others, indicating some           [10]   M. Iavich, M. Ninidze, Advancements in Dating
uncertainty in the CNN model’s accuracy. For broader                        Undated Manuscripts through Dual Methodologies,
ranges, the manual method’s suggestions are somewhat                        29th International Conference "Information Society
supported by CNN’s predictions with high and moderate                       and University Studies" – IVUS 2024 (2024).
probabilities, though some variability is highlighted. The           [11]   G. Tabidze, Works in Fifteen Volumes 5 (2017) 250–
manual method’s datings align well with CNN’s high-                         251.
probability dates, though CNN’s inclusion of low-                    [12]   F. Wahlberg, T. Wilkinson, A. Brun, Historical
probability dates suggests possible errors or wider                         Manuscript Production Date Estimation Using Deep
uncertainty. Significant discrepancies arise when the                       Convolutional Neural Networks, 15th International
manual method’s datings diverge from CNN’s predictions,                     Conference on Frontiers in Handwriting Recognition
indicating potential limitations or inaccuracies in the CNN                 (ICFHR), IEEE (2016).
model. For cases where the manual method’s datings align             [13]   A. Hamid, et al., Deep Learning Based Approach for
closely with CNN’s high probabilities, moderate and low                     Historical      Manuscript     Dating,     International
probabilities show some acceptable variance but remain                      Conference on Document Analysis and Recognition
generally consistent. Both methods generally align for early                (ICDAR), IEEE (2019).
manuscripts, with CNN’s high-probability dates falling               [14]   M. Boudraa, A. Bennour, Combination of Local
within the manual method’s range, but significant                           Features and Deep Learning to Historical Manuscripts
differences are observed where CNN suggests dates outside                   Dating, International Conference on Intelligent
the manual range, indicating potential issues with the CNN                  Systems and Pattern Recognition (2023).
model’s accuracy. The manual method provides consistent              [15]   V. Yugay, et al., Stylistic Classification of Cuneiform
and reliable dating, while the CNN method introduces                        Signs Using Convolutional Neural Networks, IT-
variability and highlights areas for further refinement to                  Information Technology 0 (2024).
improve accuracy.                                                    [16]   A. Wang, et al., Repvit: Revisiting Mobile CNN from
                                                                            VIT Perspective, IEEE/CVF Conference on Computer
Acknowledgments                                                             Vision and Pattern Recognition (2024).
                                                                     [17]   D. Bhatt, et al., CNN Variants for Computer Vision:
This work was supported by Shota Rustaveli National
                                                                            History, Architecture, Application, Challenges and
Science Foundation of Georgia under grant [No. FR-21-
                                                                            Future Scope, Electronics 10(20) (2021).
7997] Graphematic research and methodology of dating
                                                                     [18]   P. Verma, G. Foomani, Improvement in OCR
manuscripts.
                                                                            Technologies in Postal Industry Using CNN-RNN
                                                                            Architecture: Literature review, Int. J. Machine
References                                                                  Learning Comput. 12(5) (2022).
[1]   K. Nesměrák, I. Němcová, Dating of Historical                  [19]   V. Kharchenko, I. Chyrka, Detection of Airplanes on
      Manuscripts Using Spectrometric Methods: a mini-                      the Ground Using YOLO Neural Network,
      review, Analytical Letters 45(4) (2012) 330–344.                      International Conference on Mathematical Methods
[2]   E. Omayio, S. Indu, J. Panda, Historical Manuscript                   in Electromagnetic Theory, MMET (2018) 294–297.
      Dating: Traditional and Current Trends, Multimedia
      Tools and Applications 81(22) (2022) 31573–31602.
[3]   L. MacKinney, Medical Illustrations in Medieval
      Manuscripts, Univ of California Press (2023).
[4]   D. Antons, et al., The Application of Text Mining
      Methods in Innovation Research: Current State,
      Evolution Patterns, and Development Priorities, R&D
      Management 50(3) (2020) 329–351.
[5]   A. Hamid, et al., Historical Manuscript Dating Using
      Textural Measures, International Conference on
      Frontiers of Information Technology (FIT), IEEE
      (2018).
[6]   V. Dearing, Manual of Textual Analysis, Univ of
      California Press (2023).
[7]   D. Van der Meij, Other Information on Dating and
      Ownership, Indonesian Manuscripts from the Islands


                                                               130