Signature-based manual dating vs. neural network automation⋆ Tea Tvalavadze1,†, Ia Ghadua1,†, Giorgi Kalandadze2,† and Maksim Iavich3,*,† 1 Giorgi Leonidze State Museum of Georgian Literature, 36 Petre Kavtaradze str., Tbilisi, 0186, Georgia 2 Association for Textual and Editorial Studies and Digital Humanities, 17 Sakhalkho str., Tbilisi, 0113, Georgia 3 Caucasus University, School of Technology, 1 Paata Saakadze str., Tbilisi, 0102, Georgia Abstract Dating manuscripts is a multifaceted task that necessitates integrating various analytical methods to establish historical context and authenticate documents. This paper compares two methods for dating manuscripts of Galaktion Tabidze, a notable Georgian poet. We utilized both a manual signature analysis and an automated Convolutional Neural Network (CNN) approach to date undated manuscripts from Tabidze’s archive. The manual signature method relied on analyzing specific graphematic features associated with different periods of Tabidze’s work. This approach provided clear and consistent dating results for manuscripts. The CNN method, on the other hand, used probabilistic estimates to suggest dates. While the CNN method generally supported the manual findings, it also introduced some uncertainties. For instance, the CNN method suggested certain dates that did not align with the manual analysis, such as late 20th-century dates for manuscripts that the manual method dated to earlier periods. The comparison highlighted that the manual signature method offered more reliable and precise dating, especially for earlier works. The CNN method, while valuable, introduced variability and indicated areas where the model’s accuracy could be improved. This study demonstrates that while both methods have their strengths, the manual approach provides a more consistent basis for dating manuscripts, whereas the CNN method serves as a complementary tool with potential for further refinement. Keywords manuscript dating, neural networks, manual dating, automatic dating1 1. Introduction known works, also plays a role [3–6]. Quotations or influences from texts with established dates can assist in Dating manuscripts is a complex and multifaceted task that dating the manuscript. For manuscripts on organic requires careful analysis and the integration of various materials like paper or parchment, carbon dating can methods. This process is essential for understanding a estimate the age of the material. Although this provides a manuscript’s historical context and verifying its date range, it may not pinpoint the exact date of the authenticity. Several approaches are used: Examining manuscript’s creation. The manuscript’s provenance, or its physical characteristics such as paper, ink, and binding history of ownership, can offer additional dating materials can provide clues about the manuscript’s age. information. Inscriptions, ownership marks, or historical Specific types of paper and ink, along with features like records related to previous owners can provide valuable watermarks and script styles, can often be linked to clues [7, 8]. particular periods or regions [1, 2]. Paleography, or the In the initial phase of the graphematic analysis of the study of ancient handwriting styles, is another critical manuscripts of Galaktion Tabidze, the preeminent method. By analyzing the script, scholars can identify Georgian poet of the 20th century (1891–1959), the research changes in handwriting over time, which helps in team aimed to date the manuscripts preserved in his establishing the manuscript’s timeframe. Historical archive. The team selected 2–3 dated manuscripts from references within the manuscript—such as mentions of each year between 1905 and 1959. They deconstructed the events, figures, or other works—can offer dating clues. If scanned images, compiled databases of graphemes and the manuscript refers to specific historical events or graphemic pairs, and identified the most informative individuals with known dates, this information can help element types for dating purposes, subsequently coding narrow down its creation period. Textual analysis, which these elements [9, 10]. The database of undated involves comparing the manuscript’s content with other manuscripts was then processed using the same principles, CSDP-2024: Cyber Security and Data Protection, June 30, 2024, Lviv, 0000-0003-3742-6825 (T. Tvalavadze); 0000-0002-7434-2519 Ukraine (I. Ghadua); 0009-0002-3269-0133 (G. Kalandadze); 0000-0002-3109-7971 ∗ Corresponding author. (M. Iavich) † These authors contributed equally. © 2024 Copyright for this paper by its authors. Use permitted under teatvalavadze@gmail.com (T. Tvalavadze); ghaduaia@gmail.com Creative Commons License Attribution 4.0 International (CC BY 4.0). (I. Ghadua); gkalanda@hotmail.com (G. Kalandadze); miavich@cu.edu.ge (M. Iavich) CEUR Workshop ceur-ws.org ISSN 1613-0073 123 Proceedings with an attempt to date them based on specific graphemic However, they can contribute significantly to achieving features identified over the years. The dating of the test accurate results within a comprehensive research manuscripts revealed that the predominant presence of the framework. For instance, in cases where multiple potential 622 graphemic types across all periods of Tabidze’s work dates have been identified through other methods, these hindered precise dating, thus impacting the overall results intervals can help us choose the most likely date. [11]. For example, let’s consider the case of one poem by Given that a single comparative analysis of all the Galaktion Tabidze, which he published in 1933 under the features of all graphemes did not yield significant results, title “The First of May” with the inscription “Poem we decided to refine our approach by focusing on specific delivered at an illegal evening in 1908, on the first of May”. types of elements that were either consistently used or The date 1908 is written not only on the publication but also distinctly absent during particular periods. To achieve this, on the autographs. The fact is that Galaktion regularly we conducted a detailed examination of each of the 622 published his poems, and, unless some external graphematic types identified in our codebook. Our findings circumstances prevented it, nothing remained unpublished. indicated that most types appeared in manuscripts from Given the social democrats’ rise to power in Georgia in 1918 specific years, but not in adjacent years, and then and the subsequent annexation by Soviet Russia in 1921, it reappeared after a gap of one or two years. This pattern is clear that there would have been no obstacles to likely resulted from random variation rather than date- publishing this poem from 1918 onwards. Consequently, informative differences between manuscripts from these there is suspicion that the poet may have attributed a false years. Nonetheless, the extended periods of use or non-use date to the poem to construct an image of his “revolutionary of specific types revealed through this study could provide past”. Considering the circumstances of writers under a robust basis for more accurate dating. In this paper we Stalin’s totalitarian regime, this is not surprising. While no present two methods of dating the manuscripts, the first opposition has arisen since the collapse of the Soviet Union one, the manual using the signatures of the authors, and the to the notion that Galaktion might have falsified the date, second one using the Neural Networks. definitive proof confirming the falsification of this poem’s date by the author has not yet emerged. 2. Dating the manuscripts using Five of the six autographs of the poem are dated: two indicate 1905, two indicate 1908, and one, at the end of the signature methods poem, is dated April 26, 1933. Such variability in the dating As previously mentioned, the majority of the graphemic by the author naturally strengthened the suspicion of its types we identified were present in manuscripts from at mystification. However, graphematic research provided an least one or two years across each decade of the poet’s opportunity to substantiate this assumption. Since three activity: the 1910s, 1920s, 1930s, 1940s, and 1950s. autographs of the poem (MGL, N4763, N5381, N5552) are Consequently, relying solely on these bases made it difficult corrected so much that they reflect the process of creating to accurately date the texts. However, we encountered the poem rather than merely “copying” it, it was evident that several exceptions. For instance, the analysis of dated determining the creation time of these manuscripts through manuscripts revealed that the element type >გ<2/2] was graphemic analysis would aid in pinpointing the poem’s used by the poet exclusively from April 1908 to December actual date of composition. In this instance, our task was 1910. This discovery allowed us to date dozens of undated relatively straightforward: we needed to select one of the manuscripts that utilized this specific type. three possible dates inscribed on the manuscripts. Given that During the compilation of the codebook, we noted the the handwriting from 1905–1908 shares all the characteristics presence of various graphemic types in the manuscripts of required for our common research parameters, our choice each year but did not record the percentage relationship of was effective between these dates and 1933. Here, there were each type with other types of the same element. numerous distinguishing features to consider. Consequently, negative statistics—identifying which types Until 1911, no type of the double-arched “დ” (>დ<3/) is were absent in specific years—proved more fruitful than found in the Galaktion’s manuscripts. The types of the positive statistics in our research. In other words, double-arched “რ” (>რ<4/) and the upper additional line of understanding which types were not found in certain years “ლ” (>ლ<6/) do not appear until 1909, the type of grapheme provided more valuable insights. Given that the earliest “შ” (>შ<2/7))—before 1910, type of “წ” (>წ<1/9]) until 1912, extant manuscript of Galaktion dates from 1905 and the latest from 1959, element types absent from 1905–1915 type of “პ” (>პ<3/4])—until 1915, etc. Therefore, the presence suggest these elements began to be used in 1916, allowing of all these elements in the manuscripts of the poem “The us to date manuscripts containing these types to periods First of May”, and in large numbers, is a factual confirmation post-1915. Similarly, element types present between 1949 that they were created in 1933, and not in 1905 or 1908. See and 1959, but absent in earlier periods, indicate that the the Fig. 1. poet ceased using these types after 1949, enabling us to date texts with these types to before 1949. Naturally, a conclusive determination cannot be based on the presence or absence of a single grapheme. Therefore, we also conducted studies on other graphemes to confirm their compatibility with the estimated periods we identified. Figure 1: Elements set As previously mentioned, extended time intervals The primary basis for dating manuscripts through indicating the use or non-use of specific types of elements graphematic research lies in the systematic variation of cannot independently solve the problem of dating a text. outlines over the years. People, especially writers, often 124 modify the outlines of individual graphemes and their tied of outlines that characterized the author’s facsimiles during pairs. However, as our experience indicates, they tend to specific periods were particularly valuable for dating. focus more extensively on their facsimiles, which are Conversely, those that appeared almost every year or only directly linked to their individuality. Consequently, at the once or twice were deemed ineffective for this purpose. next stage of our graphematic research, we decided to Galaktion began publishing poems at the age of 17, in observe and analyze the facsimiles of Galaktion Tabidze. 1908, and his exceptional creative potential became From one of the poet’s recollections, we learn about his immediately apparent. As he was an aesthete by nature, it keen observation of his singing teacher’s signing process at was very important for him what kind of facsimile would the Kutaisi theological school: “Sharabi-dze: for this “dze” appear under the autographs of his poems. Therefore, he he would draw a fast first line, then he would turn it with a invested substantial effort into perfecting it. His notebooks second line, then he would add a third line. These were reveal numerous facsimiles written consecutively and musical score lines, and on these lines, he would draw the reflect the meticulous process of working on and refining clef so quickly and beautifully that I was amazed...” It his facsimile. In one case, his signature took the shape of a appears that he admired the teacher’s signature style so ship and in another, it had the contours of an ornament. The much that he developed a similar signature himself. graphemes were alternately enlarged, elongated, or angular, Initially, he depicted the violin key horizontally in the and sometimes the initials of his name and surname were lower part of the signature (D-273) and later began to shape either combined or intricately inserted one into the other. See the initials of his name and surname into a vertical violin Fig. 4. key. He greatly appreciated when the initials of the name and surname, or ideally, their syllables, were repeated. See Fig. 2. Figure 4: Signature with initials The signatures preserved in the poet’s archive exhibit various compositional forms: the full first and last name, Figure 2: Signature basic the first name abbreviated (“Gal.”) with the full last name, Several pages are filled with the Russian signatures of the first name initial with the last name abbreviated in the Ieronim Yasinsky, in which he first altered the initial of the middle (“G. T-dze”), first and last name initials (“G. T.”), the name to a letter similar to the initial of the surname, then first name only (“Galaktion”), the abbreviated first name changed the surname ending to a feminine one, included only (“G.”), and the first name initial with the final letters another syllable containing the initial in the surname itself, of the last name unidentifiable due to the rapid writing and invented a name that would begin with the same style. Among these, the latter type, the typical automated syllable as the surname. Observing these practices aided us facsimile, demonstrates the most variations over the years. in analyzing the variations he introduced into his facsimile. This type aims to indicate authorship rather than to See Fig. 3. perfectly represent all graphemes. Galaktion appears to have been particularly fond of incorporating symmetrical elements into his facsimiles. As previously discussed, he created a symmetrical signature for the surname Yasinsky. Regarding his facsimiles, starting in 1907, he began adding bold horizontal lines to the initials of his first and last names to emphasize symmetry. During 1908–1909, he sometimes combined these two horizontal Figure 3: Russian signature lines into one. Please see Fig. 5. Galaktion Tabidze was highly sensitive to the issue of authorship, experiencing great distress even when others appropriated individual rhymes and not the entire poems. Consequently, it became his custom to sign each poem at Figure 5: Signature with symmetry the end, even when he wrote dozens of poems in a single notebook. This practice provided us with a wealth of By 1910, he started to write in bold the upper and lower material for graphematic research. We began by collecting parts of the vertical “curly” element, appended to the right all facsimiles from the poet’s extensive archival units, side of the facsimile so that they were symmetric to the sorting the facsimiles of dated manuscripts by year, and horizontal lines, added to the letter “g” or “g” and “t”. Such creating a separate database for updated ones. Our research facsimiles can be found up to and including 1915. Please see commenced with the study of dated facsimiles, identifying Fig. 6. the constituent elements of each facsimile and categorizing the types within these elements. In this analysis, the types 125 Figure 6: Signature with bold characters Galaktion’s desire to incorporate symmetry into his 3819. As a result, we were able to date 22 verses, 1 poem, 1 facsimiles persisted even beyond 1915. However, starting in play, 14 diaries, and 2 personal letters belonging to 1916, he abandoned the use of vertical “curl” and instead Galaktion Tabidze. Although in some cases we could only introduced symmetry by accentuating the individual arc determine the lower limit of the time interval, this is still forms of the graphemes in the facsimile. For instance, the significant. Without at least an approximate date, it would arc opening to the right of the “t” and the arc opening to the be impossible to include these works in the bio-bibliography left of the “e” framed the facsimile. When the facsimile and place them accurately within the author’s biographical featured the initial of his name, this mirror symmetry was context. created by the arc of the “g” rather than the “t”. See Fig. 7. 3. Neural networks for manuscript dating Neural networks have emerged as powerful tools in the analysis of historical manuscripts, offering innovative methods for extracting and interpreting various attributes of these documents. By leveraging advanced machine learning techniques, researchers can gain new insights into the content, structure, and material characteristics of Figure 7: Signature with accentuating characters manuscripts. The process begins with the collection of high- resolution images of manuscript pages, capturing detailed From 1917 onwards, the poet began writing his full name, information about the text, handwriting, paper quality, and “Galaktion”, and emphasized symmetry by darkening the ink composition. Annotated datasets, comprising additional line of the “l”, the horizontal line of the “o,” and manuscripts with known attributes, serve as a foundation sometimes the upper part of the “n”. See Fig. 8. for training neural networks. These datasets provide the necessary ground truth for the network to learn from and refine its analysis capabilities [12–15]. Neural networks, particularly Convolutional Neural Networks (CNNs), excel in feature extraction from image data. For text analysis, Optical Character Recognition (OCR) Figure 8: Signature with full name. is employed to convert handwritten text into machine- readable formats, enabling the network to identify and Accordingly, the comparative study of the chronological analyze text patterns and stylistic elements. In addition to database of Galaktion’s facsimiles revealed four distinct text, neural networks can examine handwriting styles, time intervals characterized by specific stylistic elements. detecting variations and trends that may indicate authorship From 1907 to 1910, he utilized the symmetry of the letters or stylistic changes over time [16–19]. They also assess the “g” and “t” by adding horizontal lines to them. Between 1910 physical attributes of manuscripts, such as paper texture, and 1915, he enhanced this symmetry by darkening the top, ink fading, and unique markings, which provide insights bottom, or both top and bottom lines of a vertical “curl”. into the materials and methods used during the Starting in 1916, he emphasized the left part of his first name manuscript’s production. Training the neural network initial and the outlines “t”, and “e” in his last name. Since involves using these annotated datasets to help the model 1917, he introduced symmetry through the upper part of the recognize and learn patterns and features associated with italicized “g”, the additional line of the “l”, the horizontal different manuscript attributes. Advanced models, including line of the angular “o”, and the upper arc of the “n”. Recurrent Neural Networks (RNNs) or Transformers, can be Facsimiles similar to the aforementioned first feature, used for sequential text analysis, while transfer learning identified through the research of the dated text facsimiles, allows for the use of pre-trained models to enhance were not found in the updated database. However, based on accuracy. the discovery of the second feature, we dated a number of Once trained, the neural network can be applied to new the archival units to the years 1910–1915, specifically: MGL: manuscripts to perform detailed analyses. It can identify 45, 416, 650, 651, 1330, 1319–1327, 1361, and 2176. The third patterns, detect subtle variations, and extract meaningful feature allowed us to date archival units to the period after features from both the text and physical attributes of the 1916, including MGL 471–19, 507–2, 527–6, 603–12, 618–1, manuscript. This capability extends to recognizing specific 637–8, 655–2, 1378, and 24551–242. The fourth feature types of paper or ink, assessing text style, and detecting indicated a period after 1917 for the following archival units: unique annotations or markings. Validation of the neural MGL 420–8, 488–1, 496–3, 638–10, 1378, 1678, 1718, and 126 network’s predictions involves cross-referencing with robustness and generalization, as it exposes the model to known samples and expert evaluations to ensure accuracy diverse perspectives and orientations of manuscript images. and reliability. Integrating the outputs of neural networks The CNN model architecture is defined sequentially with expert knowledge is crucial, as it provides a using Keras, starting with a `Rescaling` layer to normalize comprehensive understanding of the findings and their pixel values between 0 and 1. Normalization ensures historical context. consistency in input data, facilitating efficient model The advantages of using neural networks in manuscript convergence during training. The subsequent `Sequential` analysis include their ability to provide objective, data- container encapsulates layers responsible for feature driven insights and efficiently process large volumes of data. extraction and classification. They are capable of identifying intricate patterns and The model incorporates three convolutional layers features that might elude traditional methods and can be (`Conv2D`), each followed by a `MaxPooling2D` layer. scaled to analyze manuscripts from diverse historical Convolutional layers apply a set of filters to extract periods and regions. However, there are challenges to hierarchical features from input images, while max-pooling consider. The quality of the training data is critical, and layers downsample feature maps, reducing computational obtaining high-quality, annotated datasets can be demanding. complexity and focusing on prominent features. These Manuscripts often exhibit complex and overlapping features, operations enable the model to learn spatial hierarchies and which require advanced neural network architectures to abstract representations inherent in manuscript images. model effectively. Additionally, neural network findings Batch normalization layers (`BatchNormalization`) are should be validated with expert input to ensure that interspersed between convolutional and activation layers, interpretations align with historical knowledge. Looking stabilizing training by normalizing activations and forward, integrating neural networks with other analytical accelerating convergence. This technique enhances model techniques, such as chemical analysis or historical records, training efficiency and robustness to variations in input data. can further enhance manuscript analysis. Expanding datasets Following convolutional operations, feature maps are to cover a broader range of historical contexts will improve flattened (`Flatten`), converting multi-dimensional tensors model generalization, and ongoing advancements in neural into one-dimensional vectors suitable for dense layers. Two network technology promise to refine and enhance analytical fully connected (`Dense`) layers with ReLU activation capabilities. functions facilitate nonlinear mapping and feature Neural networks represent a significant advancement in aggregation. The first dense layer employs L2 regularization manuscript analysis, offering new methodologies for (`regularizers.l2`) to mitigate overfitting, penalizing large understanding and authenticating historical documents. weights, and promoting model generalization. Their application enables a more detailed examination of The output layer consists of `num_classes` units text and physical attributes, contributing to a deeper corresponding to the number of historical periods in the comprehension of manuscript origins and characteristics. dataset. Utilizing softmax activation, the output layer The next section describes the methodology of using neural computes probabilities for each period, facilitating multi- networks in manuscript dating. class classification by assigning manuscripts to their most likely historical categories based on learned features. 4. The automatic method using To optimize model parameters, the model is compiled with the Adam optimizer, known for its efficiency in neural networks stochastic optimization tasks. Sparse categorical cross- The manuscript dataset resides in the good drive folder entropy serves as the loss function, appropriate for multi- `/content/drive/MyDrive/photos`, structured into class classification where each manuscript is assigned a subdirectories corresponding to different historical periods. single historical period label. Training commences using the This organizational scheme allows TensorFlow’s `fit` method, iterating over a specified number of epochs (30 `ImageFolder` utility to efficiently load and categorize images epochs in this case) to adjust model weights based on based on their respective periods. By leveraging directory training data (`train_ds`). Validation against separate names as class labels, the dataset loading process is validation data (`val_ds`) assesses model performance on streamlined, facilitating subsequent preprocessing steps. unseen examples, preventing overfitting and validating its Upon loading, the dataset is split into training and ability to generalize to new manuscripts. During training, validation sets using `image_dataset_from_directory`. This metrics such as accuracy and loss are monitored and function partitions the dataset based on a specified visualized using matplotlib. Plots of training/validation validation split (in this case, 5%), ensuring that a small accuracy and loss across epochs provide insights into model portion of data is reserved for model validation. Parameters convergence and performance trends, aiding in the such as image size and batch size are configured to assessment of model efficacy and identification of potential standardize input dimensions (`img_height` and improvements. `img_width` set to 180 pixels each) and optimize memory Upon completing training, the trained model is saved usage during training. using `model.save`, storing the model’s architecture, Data augmentation is implemented using TensorFlow’s weights, and optimizer state on disk. This step ensures that `RandomFlip` method, which introduces variations in the trained model can be reused and deployed for inference training images by randomly flipping them horizontally and on new manuscript images without the need for retraining. vertically. This technique is crucial for enhancing model In addition to model saving, a zip archive containing model files (`mymodel.zip`) is created, enhancing 127 portability and facilitating distribution for collaborative train_ds = train_ds.map(lambda x, y: research or deployment in digital archives. (normalization_layer(x), y)) For inference on new, undated manuscripts, the saved val_ds = val_ds.map(lambda x, y: model is loaded using `tf.keras.models.load_model`. (normalization_layer(x), y)) Manuscript images from a designated directory (`drive/MyDrive/toCheck`) are loaded and preprocessed # Cache and prefetch datasets for performance using TensorFlow’s image processing utilities. Each image AUTOTUNE = tf.data.AUTOTUNE undergoes resizing (`target_size = (180, 180)`) and conversion train_ds = into a numerical format suitable for input to the model. train_ds.cache().prefetch(buffer_size=AUTOTUNE) Inference is conducted using `model.predict`, val_ds = generating predictions in the form of probabilities for each val_ds.cache().prefetch(buffer_size=AUTOTUNE) historical period. The top three predicted periods for each manuscript are stored in a dictionary (`resultDict`), facilitating further analysis and validation by historians and # Step 2: Model Architecture researchers. This approach enables automated dating and categorization of undated manuscripts based on visual # Define CNN model architecture content, leveraging machine learning to support historical model = tf.keras.Sequential([ research and analysis. tf.keras.layers.Rescaling(1./255), data_augmentation, The pseudo code of CNN: tf.keras.layers.Conv2D(32, (3, 3), activation=‘relu’), tf.keras.layers.MaxPooling2D((2, 2)), # Step 1: Data Handling and Preprocessing tf.keras.layers.Conv2D(64, (3, 3), activation=‘relu’), tf.keras.layers.MaxPooling2D((2, 2)), # Define data directory tf.keras.layers.Conv2D(128, (3, 3), activation=‘relu’), data_dir = ‘/content/drive/MyDrive/photos’ tf.keras.layers.MaxPooling2D((2, 2)), tf.keras.layers.Flatten(), # Load dataset using ImageFolder tf.keras.layers.Dense(128, activation=‘relu’, builder = tfds.ImageFolder(data_dir) kernel_regularizer=regularizers.l2(0.001)), dataset = builder.as_dataset(shuffle_files=True) tf.keras.layers.Dense(num_classes) # num_classes is the number of historical periods # Split dataset into training and validation sets ]) train_ds = tf.keras.utils.image_dataset_from_directory( data_dir, # Compile the model validation_split=0.05, model.compile( subset="training", optimizer=‘adam’, seed=123, image_size=(180, 180), loss=tf.keras.losses.SparseCategoricalCrossentropy(from_l batch_size=32 ogits=True), ) metrics=[‘accuracy’] ) val_ds = tf.keras.utils.image_dataset_from_directory( data_dir, validation_split=0.05, # Step 3: Model Training and Evaluation subset="validation", seed=123, # Train the model image_size=(180, 180), history = model.fit( batch_size=32 train_ds, ) validation_data=val_ds, epochs=30 # Apply data augmentation ) data_augmentation = keras.Sequential([ # Step 4: Model Saving and Deployment layers.experimental.preprocessing.RandomFlip("horizontal _and_vertical"), # Save the model ]) model.save(‘path/to/save/model’) # Normalize pixel values # Step 5: Inference on New Data normalization_layer = tf.keras.layers.Rescaling(1./255) # Load the saved model # Prepare dataset for training and validation model = tf.keras.models.load_model(‘path/to/saved/model’) 128 1910-1915 ა-1327 # Inference on new data 1910-1915 ა-1361 for file in os.listdir(‘drive/MyDrive/toCheck’): 1910-1915 ა-2176 test_image = 1910-1915 ა-2176. 2 tf.keras.preprocessing.image.load_img(‘drive/MyDrive/toC 1916 - დ-471-19 heck/’ + file, target_size=(180, 180)) 1916 - დ-507-2 test_image = image.img_to_array(test_image) 1916 - დ-527-6 test_image = np.expand_dims(test_image, axis=0) 1916 - დ-603-12 1916 - დ-618-1 # Predictions 1916 - დ-637-8 predictions = model.predict(test_image) 1916 - დ-655-2 top_three_predictions = 1916 - ა-1378-2 get_top_three_predictions(predictions) 1916 - ხ-24551-242 1917- დ-420 # Store results 1917- დ-488 resultDict[file] = top_three_predictions 1917- დ-496 1917- დ-638 # Utility function to get top three predictions 1917- დ-1678 def get_top_three_predictions(predictions): 1917- დ-1718 class_labels = [i for i in range(1907, 1959)] class_labels.remove(1916) The automated CNN method gave us the following results, class_labels.remove(1917) it the period column there are three probable answers: class_labels.remove(1918) class_labels.remove(1920) Table 2 Automotive CNN method # Convert predictions to list and find top three indices Period Title predictions_list = predictions.tolist()[0] [1912, 1914, 1956] ა-45 sorted_predictions = sorted(predictions_list, [1910, 1926, 1930] ა-416 reverse=True) [1950, 1911, 1934] დ-650 top_three_indices = [1910, 1911, 1912] დ-651-11 [predictions_list.index(sorted_predictions[i]) for i in [1911, 1910, 1958] ა-1330 range(3)] [1910, 1926, 1930] ა-1319 [1912, 1913, 1911] ა-1920 # Map indices to class labels [1911, 1926, 1950] ა-1321 top_three_labels = [class_labels[idx] for idx in [1940, 1912, 1950] ა-1322 top_three_indices] [1955, 1957, 1940] ა-1323 return top_three_labels [1910, 1928, 1949] ა-1324 [1912, 1925, 1910] ა-1325 5. Experiments [1915, 1927, 1938] ა-1326 We have analyzed the set of undated manuscripts using [1910, 1926, 1908] ა-1327 both methods described in the paper, the manual signature [1910, 1926, 1930] ა-1361 method gave us the following results: [1910, 1926, 1930] ა-2176 [1915, 1943, 1912] ა-2176. 2 Table 1 [1925, 1926, 1922] დ-471-19 Signature method [1958, 1950, 1914] დ-507-2 Period Title [1925, 1926, 1926] დ-527-6 1910-1915 ა-45 [1908, 19406, 1950] დ-603-12 1910-1915 ა-416 [1925, 1926, 1908] დ-618-1 1910-1915 დ-650 [1922, 1925, 1950] დ-637-8 1910-1915 დ-651-11 [1949, 1926, 1925] დ-655-2 1910-1915 ა-1330 [1950, 1955, 1910] ა-1378-2 1910-1915 ა-1319 [1940, 1955, 1908] ხ-24551-242 1910-1915 ა-1920 [1922, 1936, 1910] დ-420 1910-1915 ა-1321 [1940, 1941, 1955] დ-488 1910-1915 ა-1322 [1913, 1950, 1958] დ-496 1910-1915 ა-1323 [1925, 1921, 1956] დ-638 1910-1915 ა-1324 [1930, 1936, 1910] დ-1678 1910-1915 ა-1325 [1908, 1909, 1910] დ-1718 1910-1915 ა-1326 [1958, 1957, 1950] 3819 129 6. Conclusions of Java, Madura, Bali and Lombok, Brill (2017) 405– 441. In evaluating manuscript dating, we compared results from [8] J. Droese, J. Karolewski, Manuscript Albums and Their the manual signature method and the automated Cultural Contexts: Collectors, Objects, and Practices, Convolutional Neural Network (CNN) method, focusing on De Gruyter (2024). their alignment and discrepancies with probabilistic [9] T. Tvalavadze, et al., Automated Dating of Galaktion estimates from the CNN method. The manual signature Tabidze’s Handwritten Texts, Advances in Computer method offers clear dating, and the CNN method aligns with Science for Engineering and Education VI, LNDECT this, showing high probabilities for certain dates while 181 (2023). doi: 10.1007/978-3-031-36118-0_23. suggesting lower probabilities for others, indicating some [10] M. Iavich, M. Ninidze, Advancements in Dating uncertainty in the CNN model’s accuracy. For broader Undated Manuscripts through Dual Methodologies, ranges, the manual method’s suggestions are somewhat 29th International Conference "Information Society supported by CNN’s predictions with high and moderate and University Studies" – IVUS 2024 (2024). probabilities, though some variability is highlighted. The [11] G. Tabidze, Works in Fifteen Volumes 5 (2017) 250– manual method’s datings align well with CNN’s high- 251. probability dates, though CNN’s inclusion of low- [12] F. Wahlberg, T. Wilkinson, A. Brun, Historical probability dates suggests possible errors or wider Manuscript Production Date Estimation Using Deep uncertainty. Significant discrepancies arise when the Convolutional Neural Networks, 15th International manual method’s datings diverge from CNN’s predictions, Conference on Frontiers in Handwriting Recognition indicating potential limitations or inaccuracies in the CNN (ICFHR), IEEE (2016). model. For cases where the manual method’s datings align [13] A. Hamid, et al., Deep Learning Based Approach for closely with CNN’s high probabilities, moderate and low Historical Manuscript Dating, International probabilities show some acceptable variance but remain Conference on Document Analysis and Recognition generally consistent. Both methods generally align for early (ICDAR), IEEE (2019). manuscripts, with CNN’s high-probability dates falling [14] M. Boudraa, A. Bennour, Combination of Local within the manual method’s range, but significant Features and Deep Learning to Historical Manuscripts differences are observed where CNN suggests dates outside Dating, International Conference on Intelligent the manual range, indicating potential issues with the CNN Systems and Pattern Recognition (2023). model’s accuracy. The manual method provides consistent [15] V. Yugay, et al., Stylistic Classification of Cuneiform and reliable dating, while the CNN method introduces Signs Using Convolutional Neural Networks, IT- variability and highlights areas for further refinement to Information Technology 0 (2024). improve accuracy. [16] A. Wang, et al., Repvit: Revisiting Mobile CNN from VIT Perspective, IEEE/CVF Conference on Computer Acknowledgments Vision and Pattern Recognition (2024). [17] D. Bhatt, et al., CNN Variants for Computer Vision: This work was supported by Shota Rustaveli National History, Architecture, Application, Challenges and Science Foundation of Georgia under grant [No. FR-21- Future Scope, Electronics 10(20) (2021). 7997] Graphematic research and methodology of dating [18] P. Verma, G. Foomani, Improvement in OCR manuscripts. Technologies in Postal Industry Using CNN-RNN Architecture: Literature review, Int. J. Machine References Learning Comput. 12(5) (2022). [1] K. Nesměrák, I. Němcová, Dating of Historical [19] V. Kharchenko, I. Chyrka, Detection of Airplanes on Manuscripts Using Spectrometric Methods: a mini- the Ground Using YOLO Neural Network, review, Analytical Letters 45(4) (2012) 330–344. International Conference on Mathematical Methods [2] E. Omayio, S. Indu, J. Panda, Historical Manuscript in Electromagnetic Theory, MMET (2018) 294–297. Dating: Traditional and Current Trends, Multimedia Tools and Applications 81(22) (2022) 31573–31602. [3] L. MacKinney, Medical Illustrations in Medieval Manuscripts, Univ of California Press (2023). [4] D. Antons, et al., The Application of Text Mining Methods in Innovation Research: Current State, Evolution Patterns, and Development Priorities, R&D Management 50(3) (2020) 329–351. [5] A. Hamid, et al., Historical Manuscript Dating Using Textural Measures, International Conference on Frontiers of Information Technology (FIT), IEEE (2018). [6] V. Dearing, Manual of Textual Analysis, Univ of California Press (2023). [7] D. Van der Meij, Other Information on Dating and Ownership, Indonesian Manuscripts from the Islands 130