<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Symposium on NLP for Social Good, April</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Multi-modal Academic Document Segmentation Dataset for Smart Question Bank Management</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Utathya Aich</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Swarnendu Ghosh</string-name>
          <email>drghosh90@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tulika Saha</string-name>
          <email>sahatulika15@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>CNH Indutrial ITC</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>India</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Document Image Analysis, Multi-modal Document Processing, Text Classification, Deep Learning</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Engineering &amp; Management, (University of Engineering &amp; Management)</institution>
          ,
          <addr-line>Kolkata</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Liverpool</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>2</volume>
      <fpage>5</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>In today's world, most major academic institutes and organizations conduct competitive exams to assess eligibility of students for admission or recruitment. Due to the rising craze among participants, traditional methods are not optimized enough to get ahead in the race. The inclusion of AI enabled tutoring is mandatory for such exams. One such area of implementation is smart question bank management system. Though we have large volumes of questions of competitive exams in physical mode, however, they are harder to process visually for systems as they consist of several types of text and non-text elements such as numbers, equations, images alongside textual paragraphs. For this purpose, we propose MADS, which is a multi-modal academic document segmentation dataset consisting of images of documents containing heterogeneous questions from the competitive exams like GMAT, GRE, GATE, SAT, UGC-NET. These documents consist of textual paragraphs along with numbers, images and equations. The dataset comes with bounding box annotation in two popular format YOLO and PASCAL-VOC formats to aid the development of eficient document segmentation algorithms. Additionally, benchmarks have been provided for state of the art deep learning based implementations such as Faster RCNN and YOLO-v8. From application point of view, the proposed dataset can identify diferent objects in an image so that later it can be used for semantic relationship and question answering applications enhancing comprehension and personalized learning experiences, thus, supporting the goal of providing quality education.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Competitive examinations are one of the most commonly used tools for academic performance
assessment. These are generally conducted for selection of candidates suitable for a specific
branch of study or work. There are multiple such exams which have become popular in both the
national and international levels. Due to this increase in competition, students and teachers are
ifnding it hard to optimize the preparation process using traditional methods which often leads
to depression amongst them [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. While e-documents are more suitable for automated systems,
it is hard to find organized question banks or materials available in the electronic format. Hard
copies of question banks are available but they are dificult to be directly processed as text,
as they contain a mixture of texts, equations, images, numbers and so on. One of the major
challenges with such documents containing a mixture of several mediums is localizing and
segmenting the appropriate textual and non-textual elements. All these components have
text like properties and they can mess up standard OCR techniques. The solutions are more
scarce when it comes to solving queries containing multi-modal data. This becomes especially
prominent for document images that does represent data as a sequence of Unicode characters,
but as pixels. To implement a truly multi-modal question answering system, it is essential
to segment this various components from complex documents before these advanced image
processing tools can be used. For this purpose, we propose “MADS” which is a multimedia
academic document segmentation dataset. For this specific work, we are primarily focusing on
questions of competitive exams of national and international levels such as GMAT, UGC-NET,
GRE, GATE and SAT. This covers a large variety of examinations catering to students of
various fields. The images in these documents contain a mixture of equations, diagrams and
numbers embedded within the body of the questions along with multiple options to choose
from as well. The proposed dataset comes with bounding box annotation corresponding to 4
classes namely equations, diagrams, numbers and texts ofering a transformative resource that
aligns with Sustainable Development Goal of Quality Education. By meticulously annotating
various elements such as text, images, equations, and numbers within question papers, this
dataset lays the groundwork for advancing educational research and technology applications.
Leveraging this dataset enables the development of innovative tools and algorithms aimed
at enhancing teaching methodologies, personalized learning experiences, and educational
accessibility. Through the identification of text, images, and equations, educational materials
can be optimized for accessibility features such as text-to-speech conversion and alternative
formats for students with disabilities. This ensures that all learners, including those with visual
impairments or learning disabilities, can access educational content on an equal basis. The
availability of the proposed dataset allows for the development of intelligent tutoring systems
and question-answering algorithms that promote deeper understanding of educational concepts.
Active participation and sustained engagement in the learning process can be obtained through
the immediate feedback and adaptive learning pathways.
      </p>
      <p>Contributions : The key contribution of this work are as follows : (i) To establish the
problem statement for multi-modal academic document image segmentation and its future
applications; (ii) Provide a challenging dataset of multi-modal document images consisting
of questions from various types of competitive examinations; (iii) To provide with necessary
annotation for document image segmentation into 4 classes, namely, equations, numbers, images,
and text; and (iv) To provide benchmarks using state-of-the-art detection algorithms.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>There has been previous approaches to managing question banks and exam protocols through
AI based technologies [2]. However, most of the approaches deal with already existing electronic
question banks [3]. There has not been much work that can automatically process the already
existing large volumes of question banks available in the printed medium in the form of
previous year question papers, study materials, educational magazines, and so on. However,
there have been several applications of computer vision on multi-modal documents from other
domains[4, 5, 6]. Some of these approaches primarily focus on text and non-text separation in
various scenarios [7, 8, 9]. In terms of multi-modal text datasets we have applications in multiple
areas that have similar set of challenges to our proposed domain. The Tobacco-3482 [10] dataset
consists of document images belonging to 10 diferent classes such as forms, letters, resumes,
memos, forms and so on. The RVL-CDIP dataset [11] consists of 400,000 grayscale images in
16 classes, with 25,000 images per class. Multi-label classification have been performed on
academic papers to extract components such as titles and keywords [12]. Moreover, some
multi-modal document image datasets that deal with mathematical equations [13] or geometry
[14] problems have also been explored. In terms of exam related problems, there are some
similar works done in specific subject groups such as social or natural sciences [ 15] or medical
entrance exams [16]. In these methods there are implementations that address multilingual
Q&amp;A problems and also multiple choice based questions. However, after a through survey
it is evident that there is a lack of datasets operating in unrestricted domains and provide
fundamental annotations regarding the multi-modal contents. Furthermore in the proposed
dataset, we are providing samples which do not have unicode representations thus, making it
equivalent to digitally scanned print media.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Dataset</title>
      <p>Due to the unavailability of multi-modal question bank dataset through which one can segregate
diferent text and non-textual elements from a given question through document segmentation,
we propose “MADS” and discuss its creation below. The sample dataset is publicly available
under Creative Commons License (CC) by the authors1.</p>
      <sec id="sec-4-1">
        <title>3.1. Data Collection</title>
        <p>As mentioned above, managing question banks can be a tedious process as the questions just does
not contain free text but its diferent form of representation such as equations, labels, figures etc.,
mostly found in competitive examinations. As a result, we chose diferent competitive exams
which are relevant to the global community as our base to collect questions of diferent form.
Our dataset sources are GRE, GATE, SAT, UGC-NET, GMAT which are typically well-known
competitive exams to pursue higher education and is very popular amongst students. We
collected oficial and sample questions from these exams openly available in web by extracting
pages from downloaded PDFs corresponding to mock questions and converted them into image
format (.png). We utilised the oficial question bank for GATE and UGC-NET examinations
and mock questions for the remaining ones. The collected data samples or questions are in the
form of images consisting of equation, number, image and text. These questions are publicly
available on the Main exam website as previous year questions. Some questions are taken from
mock questions available for free on the diferent mock exam websites [ 17, 18, 19, 20, 21, 22, 23,
24, 25, 26]. A sample raw question is shown in Figure 1.
1https://github.com/MADS-dataset/MADS_Dataset_official</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Data Annotation</title>
        <p>Next, the task was to annotate the images (typically questions) for extracting relevant
information from these images which are question text, image, label and equation. All the sample
questions were uploaded to an open-source annotation tool, Label- Studio2 for creating
bounding boxes. Three annotators from the authors’ afiliation were asked to draw bounding boxes for
these samples through this tool. The annotators were explained and demonstrated the task and
then were initially asked to annotate 10 samples each for these four categories present in the
image. These samples were then checked by the authors and the errors were resolved if any. The
annotators were then finally provided with all the remaining samples equally divided amongst
the three for annotation. On an average, there were at least one bounding box present each
for image and text class in each sample of the dataset. To create the gold standard annotated
dataset, we maintained the Intersection Over Union (IoU) score [27] between the annotated
box to be atleast 80 and in addition to that Cohen kappa score to be greater than 90% for the
acceptance of the bounding box with the class label. This Cohen kappa score is the agreement
between the annotated labels by the annotators and the authors verifying the annotations.
(a)
(b)
3.3. MADS
MADS now comprises of 230 question samples annotated for the presence of four categories of
information, namely, question text, image, label and equation with the help of bounding box.
An annotated sample from MADS is shown in Figure 1. As is visible, it contains a mixture of
equation, text, image and number, and is challenging for machines to identify these said parts
in an image easily. Some of these questions contain both numeric, text and equation on the
same line. Some images include both image and equation at the same place. In some images
questions are in two column format, which makes the dataset more challenging to segment the
regions. It is indeed dificult to identify and diferentiate amongst these and through MADS
we aim to tackle such diverse situations. Largest contribution of the dataset comes from the
GATE question which is 32.3% of the whole, followed by UGC-NET, GMAT, GRE and SAT. The
distribution of the dataset is shown in Figure 2a. It has been observed that the dataset exhibits
a predominance of text comprising of 5536 bounding boxes which is 75.5% of the annotations.
The lowest number of bounding boxes are present with images which is 191 and is 3% of the
dataset. The class based statistics is depicted in Figure 2b.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Methodology</title>
      <p>The aim of MADS is to facilitate easy training of models in order to identify diferent categories
in a given image of a question. The dataset can facilitate in identifying diferent objects in an
image which can be later be used for semantic relationship and question answering. The trained
model on MADS should then be able to identify and segregate diferent information present in
the question for smart question bank management and facilitate future research directions in
this area. In this section, we aim to benchmark MADS using diferent state of the art vision
models for detecting the bounding box.</p>
      <sec id="sec-5-1">
        <title>4.1. Benchmark Setup</title>
        <p>We benchmark MADS using two state of the art vision models as follows :</p>
        <p>• YOLO-v83: YOLO-v8 is an advancement of the YOLO [28] model. The advanced model is
developed by Ultralytics. It has a high rate of accuracy on the COCO dataset4. It is an anchor
free model which means it predicts the center of an object rather than ofset from a known
anchor box. This model is more robust to noise and occlusions than other available models.
The model uses a new backbone network called Panoptic Feature Extractor (PEE), a new loss
function called CIoU loss, and a new training strategy called SimOTA.</p>
        <p>• Faster R-CNN:[29] Ross Girshick developed Faster R-CNN. Compared to past models like
R-CNN, a new layer called ROI pooling layer has been proposed in this model. The model is a
single stage network in comparison with other previous models. Faster R-CNN does not need
much disk storage compared to R-CNN as it does not cache the extracted features.</p>
        <p>The pre-trained YOLO-v8 is fine-tuned and Faster R-CNN is trained on MADS to benchmark
the dataset using state of the art vision models for the task of detecting useful information in
the form of bounding box.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Implementation Details</title>
        <p>MADS is divided into train and test set with a ratio of 85:15. We conducted the experiment five
times and reported average of the results based on diferent models. Vanilla YOLO-v8 5 medium
model is fine-tuned on MADS. This model has 25.9 million parameters. Vanilla Faster R-CNN
model6 with ResNet-50 is used at its backend to train on MADS. All the parameters are set to
their default values. The learning rate has been set to 0.001, batch size is 64. Number of anchors
have been set to 3. As there are 4 classes for detection in MADS, we have 4 output neurons.
Confidence threshold has been set to 0.25 by default. YOLO-v8 uses LeakyReLU as its activation
function. These parameters might be tuned in future for obtaining better performance. We
used two evaluation metrics - Intersection Over Union (IoU) and Mean Average precision (mAP)
score to benchmark the performance of the models.</p>
        <p>Evaluation Metrics. The metrics IoU and mAP score are explained as follows:
• IoU Score: This metric is commonly used to evaluate the performance of object detection
algorithms. It measures the overlap between the predicted bounding box and the ground truth.
The IoU is calculated using the following formula:
  =</p>
        <p>Area_of_Overlap / Area_of_Union
(1)
where, Area_of_Overlap is the area common to both the predicted and ground truth regions
and Area_of_Union is the total area covered by both the predicted and ground truth regions.
Our experiments are evaluated same threhold of IoU used in COCO.The predicted annotations
are evaluated using IoU threshold of 0.5 and 0.9 respectively .</p>
        <p>• Mean Average Precision (mAP): Mean Average Precision is a commonly used metric to
evaluate the performance of object detection or information retrieval systems. It provides a
3https://docs.ultralytics.com/
4https://cocodataset.org/#home
5https://github.com/ultralytics/ultralytics?tab=readme-ov-file
65https://pypi.org/project/detecto/
single scalar value for two IoU threshold. We first find the average precision of each class then
average of all the classes is done to find the mAP.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Results and Analysis</title>
      <p>We take the average among all results for each model from the experiments to get the final
result of the MADS dataset. Based on the predicted IoU score, we create a threshold of 50%
and 90% to record the mAP score. Based on this threshold, we compute diferent metrics such
as accuracy, precision and recall of the models in order to determine its performance. Table
1 depicts the accuracy of Yolo-v8 and Faster R-CNN models trained on MADS. As observed,
YOLO-v8 performs better than Faster R-CNN by a significant improvement of about 15% in
terms of accuracy when the IoU threshold is 50%. Similarly, when the IoU threshold is set to
90%, YOLO-v8 shows about 3% improvement with respect to Faster R-CNN. On an average, it is
observed that the Yolo-v8 model showed a standard deviation of ±0.5 and ±1 on IoU threshold
of 50% and 90% respectively for the overall accuracy. On the other hand, Faster R-CNN tends to
have a standard deviation of ±0.7 and ±2.6 on IoU threshold of 50% and 90% respectively for the
same.</p>
      <p>Equation
47.6%
73.4%
63.8%
98.3%</p>
      <p>Average Precision
Image Number
35.9% 64.7%
80.5% 86.6%
97.5 95.3%
97.5% 50.8%</p>
      <p>Table 1 also creates a benchmark on the precision and recall for each of the classes by the
diferent models for 50% and 90% threshold. Experimental results noted that the class level
precision tends to have a standard deviation of ±3.6 for equation, ±2.2 for image, ±1.9 for number,
±0.9 for text on IoU threshold of 50% for Faster R-CNN. For YOLO-v8 on IoU threshold 50%,
precision for class level showed a standard deviation of ±1.5 for equation, ±1.1 for image, ±1.08
for number and ±1.2 for text. YOLO-v8 performs better than Faster R-CNN with a narrow
performance improvement of about 3% when the IoU threshold is 90%. On IoU threshold of 90%
it is observed that YOLO-v8 has a standard deviation of ±2.1 for equation, ±1.9 for image, ±2.9
for number and ±2.01 for text whereas Faster R-CNN showed a standard deviation of ±3.7 for
equation, ±2.4 for image, ±4.7 for number and ±3.5 for text. The reason behind YOLO-v8 superior
performance can be attributed to the fact that Faster R-CNN uses two stage detectors during
training while YOLO-v8 uses a single shot detector. This gives a huge advantage to YOLO-v8
to look through the whole image at once whereas Faster R-CNN uses regions to localize the
object within the image. We also report the precision and recall for individual class labels. The
mAP score for the Faster RCNN for IoU50 is 59.6% whereas for IoU90 is 88.37%.YOLO-v8 has a
mAP score of 84.25% for IoU50 and 86.15% for IoU90 respectively. It is observed that the text
tag seems to be the easiest to identify based on the performance as the dataset has the highest
number of text tag annotations. Sample prediction of Figure 1 from the YOLO-v8 model is
shown in Figure 3.</p>
      <p>With the increase in IoU threshold from 50% to 90%, it is observed that the models are able
to correctly classify the diferent tags. When the threshold is tuned to be 50%, more bounding
boxes are identified and there seems to be mis-classification for the same. YOLO-v8 model lacks
to classify number tags despite increase in precision for other tags while increasing the threshold
from 50% to 90%. Here, Faster R-CNN outperforms YOLO-v8 while identifying number tags on
IoU threshold of 90%. Though the YOLO-v8 performs better than Faster R-CNN in almost every
scenario, challenges do exists. Both the algorithms faces dificulty while identifying equation
and image interchangeably when they are mixed. Isolating such instances while preserving
their semantic relationships poses a considerable challenge. Some challenging image snippets
are shown in Figure 4. The models tend to find dificulty in segregating equation and images.
These issues can be further resolved by fine-tuning the hyper parameters. Size of the dataset
needs to be scaled up (which is an ongoing efort) to achieve a better performance.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusion and Future Work</title>
      <p>In this paper, we established a novel problem statement for multi-modal academic document
image segmentation and steer discussion focused on its future applications. Due to the
unavailability of any such existing dataset relevant to the task, we propose a dataset, namely, MADS
consisting of questions from various types of competitive examinations and gold-standard
annotations to extract information from these questions through the task of bounding box
detection. We benchmark MADS with the help of several state of the art vision models. The
dataset exhibits a predominance of text documents compared to other object classes, revealing
a bias in the performance of the base algorithms towards text detection. Challenges arise when
labels are annotated within the bounding boxes of text. In case of text, characters are distributed
in a horizontal and vertical format, meaningful segments can be enclosed in a rectangular
bounding box. To address this bias, fine-tuning strategies can be implemented to improve
the accuracy for other class labels. This presents an intriguing area for future research, as
overcoming these complexities would contribute significantly to the advancement of the field.
The primary goal for releasing this dataset is to spur a domain of automated teaching based
learning method to aid students appearing for such competitive exams. At its first iteration,
this dataset provides the opportunity to digitize existing question banks and annotating them
during this process. At this point the dataset primarily focuses on segregation of text, equations,
ifgures, and numbers. Finer segregation may be incorporated in the future versions of the
dataset. Future iterations will focus on increasing the volume of the dataset and broadening
the domain, embedding multi-modal questions for processing in large language models and
vision language models, integrating GPT based services to retrieve solutions for questions,
personalized mock test generations and so on. We summarise that this dataset will drive novel
research contributions and applications in the field of smart question bank management and
education in general.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgement</title>
      <p>Dr. Swarnendu Ghosh is thankful for the infrastructure support from IEM Centre of Excellence
for Data Science and the Innovation &amp; Entrepreneurship Development Cell, IEM Kolkata.
[2] G. Kurdi, J. Leo, B. Parsia, U. Sattler, S. Al-Emari, A systematic review of automatic question
generation for educational purposes, International Journal of Artificial Intelligence in
Education 30 (2020) 121–204.
[3] G. Nalawade, R. Ramesh, Automatic generation of question paper from user entered
specifications using a semantically tagged question repository, in: 2016 IEEE Eighth
International Conference on Technology for Education (T4E), IEEE, 2016, pp. 148–151.
[4] S. Bakkali, Z. Ming, M. Coustaty, M. Rusiñol, Visual and textual deep feature fusion for
document image classification, in: Proceedings of the IEEE/CVF conference on computer
vision and pattern recognition workshops, 2020, pp. 562–563.
[5] R. K. Srihari, Z. Zhang, A. Rao, Intelligent indexing and semantic retrieval of multimodal
documents, Information Retrieval 2 (2000) 245–275.
[6] J. Bateman, Multimodality and genre: A foundation for the systematic analysis of
multimodal documents, Springer, 2008.
[7] X. Bai, B. Shi, C. Zhang, X. Cai, L. Qi, Text/non-text image classification in the wild with
convolutional neural networks, Pattern Recognition 66 (2017) 437–446.
[8] K. Dutta, M. Bal, A. Basak, S. Ghosh, N. Das, M. Kundu, M. Nasipuri, Multi scale mirror
connection based encoder decoder network for text localization, Pattern Recognition
Letters 135 (2020) 64–71.
[9] L. Unsworth, Image/text relations and intersemiosis: Towards multimodal text description
for multiliteracies education, in: Proceedings of the 33rd IFSC: International Systemic
Functional Congress, Pontificia Universidade Catolica de Sao Paulo, 2007.
[10] J. Kumar, P. Ye, D. Doermann, Learning document structure for retrieval and classification,
in: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012),
IEEE, 2012, pp. 1558–1561.
[11] A. W. Harley, A. Ufkes, K. G. Derpanis, Evaluation of deep convolutional nets for document
image classification and retrieval, in: 2015 13th International Conference on Document
Analysis and Recognition (ICDAR), IEEE, 2015, pp. 991–995.
[12] G. Mustafa, M. Usman, L. Yu, M. T. Afzal, M. Sulaiman, A. Shahid, Multi-label classification
of research articles using word2vec and identification of similarity threshold, Scientific
Reports 11 (2021) 21900.
[13] D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, J.
Steinhardt, Measuring mathematical problem solving with the math dataset, arXiv preprint
arXiv:2103.03874 (2021).
[14] M. Seo, H. Hajishirzi, A. Farhadi, O. Etzioni, C. Malcolm, Solving geometry problems:
Combining text and diagram interpretation, in: Proceedings of the 2015 conference on
empirical methods in natural language processing, 2015, pp. 1466–1476.
[15] M. Hardalov, T. Mihaylov, D. Zlatkova, Y. Dinkov, I. Koychev, P. Nakov, Exams: A
multisubject high school examinations dataset for cross-lingual and multilingual question
answering, arXiv preprint arXiv:2011.03080 (2020).
[16] A. Pal, L. K. Umapathi, M. Sankarasubbu, Medmcqa: A large-scale multi-subject
multichoice dataset for medical domain question answering, in: Conference on Health, Inference,
and Learning, PMLR, 2022, pp. 248–260.
[17] GMAT, Gmat sample question paper 2023 with 100 q and a | eduaims, https://eduaims.in/
gmat-sample-paper-pdf/, Sample Questions.
[18] Hank walker, https://people.engr.tamu.edu/d-walker/index.html, Practice Problems.
[19] GRE, Gre_practicebook_2004.pdf, https://www.prepscholar.com/gre/blog/wp-content/
uploads/sites/3/2016/09/GRE_practicebook_2004.pdf, Practice Problems.
[20] GRE, Gr9768.pdf, https://wmich.edu/mathclub/files/GR9768.pdf, Practice Problems.
[21] M. PREP, 5 lb. book, https://dl.icdst.org/pdfs/files1/eceb4737c3836a94ef7ba0b88ae5510b.pdf,</p>
      <p>Practice Problems.
[22] S. S. Questions, Sat study guide 2020 - practice test 9, https://satsuite.collegeboard.org/
media/pdf/sat-practice-test-9.pdf, Sample Questions.
[23] S. S. Questions, Sat study guide 2020 - practice test 10, https://satsuite.collegeboard.org/
media/pdf/sat-practice-test-10.pdf, Sample Questions.
[24] S. S. Questions, Sat study guide 2020 - practice test 3, https://satsuite.collegeboard.org/
media/pdf/sat-practice-test-3.pdf, Sample Questions.
[25] U. NET, University grants commission - net, https://www.ugcnetonline.in/previous_
question_papers.php, Oficail Question papers.
[26] GATE, Gate 2022 oficial site, https://gate.iitkgp.ac.in/old_question_papers.html, Oficial</p>
      <p>Question papers.
[27] Jaccard index - wikipedia, https://en.wikipedia.org/wiki/Jaccard_index, IOU Similarity.
[28] J. Redmon, S. K. Divvala, R. B. Girshick, A. Farhadi, You only look once: Unified,
realtime object detection, CoRR abs/1506.02640 (2015). URL: https://www.cv-foundation.org/
openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf.
[29] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with
region proposal networks, Advances in neural information processing systems 28 (2015).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shrivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rajan</surname>
          </string-name>
          ,
          <article-title>Assessment of depression, anxiety and stress among students preparing for various competitive exams</article-title>
          ,
          <source>International Journal of Healthcare Sciences</source>
          <volume>6</volume>
          (
          <year>2018</year>
          )
          <fpage>50</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>