1. Introduction

SVNIT_CSE: Building a question answering system for Hindi using word-embedding

Ankur Jariwala

Siba Sankar Sahu

sibasankar@coed.svnit.ac.in 0 0 Department of Computer Science and Engineering, Sardar Vallabhbhai National Institute of Technology , Surat , India

2026

Question answering (QA) is an important application in the text analysis domain that enables machines to provide precise and factual answers. The rich morphology and diverse syntactic structures within Indian languages present both challenges and opportunities for the development of efective QA systems. As part of the FIRE 2025 VATIKA shared task, our SVNIT_CSE team explored diferent word embedding models for the Hindi QA system. From the evaluation results, we found that FastText embeddings with cosine similarity approach outperform other methods and provide a BLEU score and 1 score of 89.5 and 0.9180 in test data I, respectively. Similarly, it provides the BLEU score and the 1 score of 43.7 and 0.433 in test data II, respectively. The word embedding model contributes to the building of scalable, robust, and inclusive QA systems that can support access to information for millions of Indian language speakers.

Low resource languages Natural language processing Question answering system Word embeddings

1. Introduction

Question answering (QA) is an application of Information retrieval (IR) and Natural language processing (NLP) that answer user generated natural language queries. In general, the QA system is divided into two types based on the source of knowledge. The first type is extractive or closed domain; it uses span extraction from the available context to answer questions. The second type is generative, or open domain, which utilizes multiple large-scale data sources, such as the Web, documents, and journals, to answer questions. As people want quick and relevant information, so the application of QA system in diferent fields such as healthcare, education, business, and regular life through digital assistants.

In traditional search engine, the user gets a list of documents, websites, or references based on simple keyword matching; however, a QA system provides concise, accurate, and contextually relevant answers by understanding the user’s query. The most important features of a QA system is their ability to understand natural language queries and provide relevant information from structured or unstructured sources. The QA systems and Generative AI based conversational tools are a lot diferent. Large language models like ChatGPT would operate in a much broader scope and ability to keep track of context across multiple turns and perform in a more natural, open-ended manner. However, the QA system provides answers in a specific domain.

Advanced search engines, virtual assistants, and a conversational AI system are built for high-resource languages such as English and European languages [ 1 ] [ 2 ]. However, there is less availability of such AI system in low- resource languages. Developing an AI system for low resource languages closes the digital divide and makes it easier for people to access digital services, healthcare, and education. The development of QA systems for low-resource languages is of immense importance in making information access more inclusive and equitable. In this study, we explore a QA system for the low-resource Indian language.

CEUR Workshop

ISSN1613-0073

1.1. Task Description

FIRE 1 (Forum for Information Retrieval Evaluation) organized a shared task on the QA system for Varanasi Tourism (VATIKA 2). The VATIKA [ 3 ] data set comprises ten diferent data from the tourism domain. Domains are included, such as Ganga aarti, cruise, food court, public toilet, kund, museum, general, ashram, temple, and travel. Each domain includes detailed paragraph-level Hindi contexts followed by multiple QA pairs. The data set is divided into four main parts i.e; train, validation, test data I, and test data II. QA pairs of diferent training data comprise the unique ID, question, and answer. An example of a Hindi QA pair training data and test data is shown in Fig. 1 and Fig. 2. In test data II, the user generates a predicted answer for a particular query. The statistic of the VATIKA data set is shown in Table 1. { }, { "context": "भागीरथ कंु ड प.ं दीन दयाल उपाध्याय रलेव े स्टेशन स े 14.1 िकलोमीटर दरू ह।ै स्टेशन स े कंु ड तक पहुँचने के लए टैक्सी, कै ब, या बस स ेवाओं का उपयोग िकया जा सकता ह।ै यह स्टेशन पू व में मुगलसराय के नाम से जाना जाता था और भारत के प्रमुख रले ज क्शंनों म ें से एक ह।ै यहाँ से भागीरथ कंु ड तक की यात्रा वाराणसी की ऐ तहा सक ग लयों और घाटों के दृश्य प्रदान करती ह ।ै इस यात्रा म ें भक्तों को वाराणसी की सांस्क ृ तक िवरासत का अनुभव िमलता ह,ै जो इस धा मक स्थल के महत्व को और भी बढ़ा द ेता ह।ै", "qas": [ { "id": "kund_636", "question": "भागीरथ कंु ड प.ं दीन दयाल उपाध्याय रलेव े स्टेशन स े िकतना िकलोमीटर दरू ह?ै", "answer": "भागीरथ कंु ड प.ं दीन दयाल उपाध्याय रलेव े स्टेशन स े 14.1 िकलोमीटर दरू ह।ै" "id": "kund_637", "question": "भागीरथ कंु ड प.ं दीन दयाल उपाध्याय रलेव े स्टेशन स े कै से पहुँच सकते ह।ै", "answer": "भागीरथ कंु ड तक पहुँचने के लए प.ं दीन दयाल उपाध्याय रलेव े स्टेशन स े टैक्सी, कै ब, या बस सेवाओं का उपयोग िकया जा सकता ह।ै"

} } } } 1https://fire.irsi.org.in/ 2https://sites.google.com/view/vatika-2025/ }, { }, { }

"context": "आिदके शव कंु ड, जो वाराणसी के प्राचीन राजघाट के िनकट स्थ त ह,ै वरुणा एव ं गगंा निदयों के पावन संगम स्थल पर स्थ त एक अत्य तं धा मक और सांस्क ृ तक दृ ि से महत्वपू ण स्थान ह ।ै इसे भगवान िवष्ण ु को सम पत एक िदव्य ्सथल के रूप म ंे पूज्य माना जाता ह ।ै भारतीय सस्ंक ृ त एवं धमशास्तरों म ंे गगंा-वरुणा के इस स ंगम का अत्य धक आध्या त्म क एवं सांस्क ृ तक महत्व ह ,ै जसे स्नाना िद कम के माध्यम स े पापों स े मुिक्त और पुण्य प्रा ि का अ भन् स्रोत माना जाता ह ।ै श्रदधालु इस कंु ड में स्नान कर आत्मशु द्ध की अ भलाषा रखते हैं तथा इसे वाराणसी की समृद्ध ऐ तहा सक एवं धा मक िवरासत का अिनवाय अगं मानते ह।ैं यह स्थल वाराणसी की ऐ तहा सक और धा मक धरोहर का िहस्सा ह ।ै", "qas": [ { "id": "kund_001", "question": "आिदके शव कंु ड िकन निदयों के स ंगम पर स्थ त है और इसका धा मक महत्व क्या ह ?ै", "answer": "" "id": "kund_002", "question": "आिदके शव कंु ड का संबधं िकस देवता से है और श्रदधालु यहाँ क्यों आते ह "answer": "" ?ंै", "id": "kund_003", "question": "आिदके शव कंु ड को वाराणसी की धा मक िवरासत में कै से देखा जाता ह?ै", "answer": "" }

The paper is organized as follows. Section 2 presents previous research conducted in the QA domain. The preprocessing step and the model architecture are presented in Section 3. The experimental results and their analysis are described in Section 4. Finally, we conclude with the direction of future work in Section 5.

2. Related Work

Early research work on QA systems relied heavily on human efort, both in data curation and annotation. Researchers often used Wikipedia articles, newswire text, and encyclopedias as sources of factual knowledge because there were not many large-scale machine-readable datasets available. In recent years, several methods have been explored, such as machine learning, deep learning, and transformerbased models to deal with the QA system. In this section, we look at some existing studies on QA systems.

QA systems for English and other high-resource languages have come a long way [ 4 ] [ 5 ]. However, a few research has been conducted on low-resource Indian languages. Nanda et al. [ 6 ] implemented a machine learning approach for the Hindi QA system. The approach includes accessing natural language queries, feature extraction, and classification. They trained a Naive Bayes classifier to identify the correct label class for a given query. They evaluated the performance on two test data and achieved an accuracy of 92% and 88%. Hermjakob [ 7 ] combined the Penn treebank training corpus with the question treebank for the classification of questions. The experimental results show that by adding question sentences for training, the accuracy increases from 65.5% to 97.3%. Zhang and Lee [ 8 ] evaluated the five machine learning algorithms: nearest neighbors, naïve bayes, decision tree, sparse network of windows, and support vector machine. They trained the learning algorithms on diferent length of datasets and tested on the TREC 3 dataset. They found that the SVM algorithm outperformed the four other methods in the question classification.

Abdel-Nabi et al. [ 9 ] report a survey on diferent deep learning models to build a QA system. They found that diferent deep learning models such as the convolutional neural network, recurrent neural network, attention-based, hybrid models, graph-based models, generative model, and reinforcement learning-based models were used for diferent QA systems. They evaluated the efectiveness of the QA system in a wide range of evaluation metric. Tan et al. [ 10 ] used the BiLSTM model to generate the embeddings of the questions and answers. They used two data sets, i.e. TREC-QA 4 and InsuranceQA 5 to evaluate an QA system. The QA-LSTM/CNN with attention mechanism provides the best performance on TREC-QA. Similarly, QA-LSTM with attention ofers the best performance in InsuranceQA. Tay et al. [ 11 ] proposed holographic dual LSTM (HD-LSTM) for QA system. They found that extensive feature engineering was not required to build a deep learning-based QA system. They noticed that the MAP score for HD-LSTM is 0.7042 which outperformed the baseline LSTM (1 layer) with a MAP score of 0.6280. Similarly, the HD-LSTM MRR score is 0.7733 outperformed the baseline LSTM (1 layer) with a MRR score of 0.6960. Peng and Liu [ 12 ] presented an attention-based convolutional neural network model for the QA system. The experimental results show that the attention-based convolutional neural network performs better than CNN and the LSTM model.

Several benchmark data sets have been created to test cross-lingual abilities such as XQuAD (Artetxe et al. [ 13 ]), MLQA (Lewis et al. [ 14 ]), and TyDi QA (Clark et al. [15]). These corpora use a translationbased approach or parallel corpora to cover many languages. These resources enable a systematic evaluation of cross-lingual transfer, where models are trained in high-resource languages and tested in low-resource settings. The data set also helps advance research on multilingual QA. In Indian languages, there is less availability of annotated QA datasets on the Web. Hence, the researcher explored zero-shot transfer [ 14 ], few-shot fine-tuning [ 16], and synthetic data generation [17] to build their own customized datasets. These strategies are helpful, but the more complex morphology of Indian languages often limits their efectiveness.

To address these limitations, several Indian language datasets have been developed. IndicSQuAD [18], dataset is developed by translating the English SQuAD dataset into ten major Indian languages. The resource is fine-tuned with both monolingual BERT and multilingual models such as MuRIL-BERT (Khanuja et al. [19]). They found that language-specific BERT outperform MurilBERT in diferent Indian languages. IndicQA is another tool that presents a standard for both extractive and abstractive QA in 3https://huggingface.co/datasets/CogComp/trec 4https://huggingface.co/datasets/lucadiliello/trecqa 5https://huggingface.co/datasets/deccan-ai/insuranceQA-v2 diferent Indian languages [ 20]. They explored diferent LLM for QA. They explored two inferences, that is, translate test and direct test in diferent Indian languages. In the translate test, the input is translated into English, and the output is translated into Indian language. In the direct test, both the input and the output are in the native language. They found that the translate test provides better performance than the direct test in all Indian languages.

IndicBERT (Doddapaneni et al. [21]) is a lightweight transformer model built on the ALBERT architecture and trained on IndicCorp, a large collection of monolingual Indian languages. IndicBERT supports zero-shot transfer and cross-lingual generalization, allowing a model trained in one Indian language to transfer knowledge to others. Compared to multilingual models such as mBERT [22] or XLM-RoBERTa [23], IndicBERT requires fewer parameters while maintaining competitive performance, making it more suitable for languages with low resources. Doddapaneni et al. [21] show that the combination of IndicBERT and Samanantar provides a better avg 1 score than MuRIL and mBERT in diferent Indian languages. Samanatar [ 24] is the largest publicly available parallel corpora collection for eleven Indian languages. Together, these datasets and models are the building blocks for furthering QA research in Indian languages.

3. Proposed Methodology

We explore the QA system in three steps: data pre-processing, model design, and evaluation. In the preprocessing step, the data set is tokenized, segmented, and presented in a structured format. The QA model is designed using traditional similarity-based methods with modern embedding-based approaches. Finally, we evaluate the QA system using standard evaluation metrics.

3.1. Data Preprocessing

The data set is presented in SQuAD-like [25] JSON structure that contains context, question, and answer. We implemented diferent preprocessing steps and presented the data in a structured tabular format. Then, each QA pair is presented in the following way.

Q = question, C= { c1,c2,…,cn }, Atext = ground truth answer text. where C is the set of candidate sentences in the context. The goal is to identify the sentence in context C that best answers the question Q. For every method, the question and context sentences are converted into a vector space, and the similarity is measured.

3.2. Model Architecture

Word embeddings are compact, low-dimensional vector representations of words that contain syntactic and semantic characteristics of words within a continuous space. Embeddings place words with similar meanings nearer to each other in the vector space. This feature enables them to be very eficient for NLP activities such as named entity recognition [26] and sentiment analysis [27]. In this study, we explore diferent word embedding models to capture semantic similarity between question and answer and develop an eficient QA system.

3.2.1. FastText embeddings with cosine similarity approach

We use pre-trained FastText [28] embeddings 6 to represent words in a dense vector space. We tokenize the input text, and the embedding method presents each word to a 300-dimensional embedding vector. The mean embeddings of all words in the sentence to obtain sentence-level representations for both the context and the question. The averaging produces fixed-size sentence vectors that capture the semantic meaning of the text. For each question–answer pair, we represent the context and the question as separate vectors and compute the cosine similarity between the question vector and the sentence vectors of the context. The sentence with the highest cosine similarity score is selected as the predicted 6https://fasttext.cc/docs/en/crawl-vectors.html answer. The approach assumes that semantically similar sentences, based on the embedding space, are more likely to contain the correct answer span.

3.2.2. FastText embeddings with machine learning approach

We explore pre-trained FastText embeddings to represent text in a continuous vector space. Individually, word is mapped to a 300-dimensional embedding. The sentence representations for both context and question are received by averaging the embeddings of their constituent words. For every question–answer pair, we calculate a context vector and a question vector, then concatenated to construct the final feature representation. In the dataset, the annotated answer positions are used to make labels that match the training samples. These concatenated vectors are used to train a ridge regression model. The system learns to connect the input representation to the correct answer spans. The performance of the model is evaluated using the predicted and actual answers.

3.2.3. Word2Vec Embeddings

We investigate pre-trained Word2Vec [29] embeddings to obtain distributed representations of words. A fixed dimension dense vector is used to represent each token in the dataset, regardless of whether it originates from the context or the question. Word2Vec captures semantic similarity by putting words with similar meanings closer together in the embedding space. This is especially beneficial for dealing with diferent Hindi vocabulary. The system uses these embeddings to turn both context passages and questions into machine-readable form that keeps the meaning of the words. To make sentence-level representations, we determine the average of the Word2Vec embeddings of all the tokens in a certain text span. This gives us vectors of the same size for both the question and each sentence in the context. The idea is that the aggregated embedding captures the main meaning of the text span, allowing to compare the question and context in the same vector space. Then, we use cosine similarity to check similarity in the sentence and question vector in the embedding space. We used the following parameter to experiment with the Word2Vec model which is presented below in Table 2.

4. Results and Analysis

In the QA system, we explore diferent embedding-based approaches to extract the answers from the context. To evaluate the efectiveness of the system, we use evaluation metrics such as the BLEU score, the ROUGE score, and the 1 score. The evaluation score of diferent embedding models is presented in Tables 3-8. Moreover, the efectiveness of diferent word embedding models is presented graphically in Figures 3a-5b. From the evaluation results, we found that the FastText method with cosine similarity (CS) outperforms other methods in both Test data I and II in terms of BLEU and 1 score. A better BLEU and 1 scores says that FastText embeddings are better at getting exact matches and subword-level accuracy. Word2Vec model provides the best ROUGE scores in test data I as shown in Table 5 whereas Fasttext with machine learning ofers the best ROUGE scores in test data II as shown in Table 6. A better ROUGE score indicates that FastText with the ML model captures broader contextual and structural similarity. In Table 4, FastText with cosine similarity (CS), BLEU scores start relatively high at the unigram level but decrease with higher n-grams, reaching very low at BLEU-4. The results demonstrate that the ground truth answer is present in the system-generated output but not sequentially. From the analysis of the results, we found that FastText with CS provides optimal performance on diferent evaluation metric and provides competitive performance on shared task leaderboard. The evaluated model is most suitable for the Hindi QA system.

5. Conclusion

The question-and-answer is an important downstream task in the text analysis domain. In this study, we explore various embedding-based methodologies for developing a QA system in Hindi. From the evaluation results, we found that FastText with cosine similarity outperformed other methods, achieving the highest BLEU and QA- 1 scores on diferent test data. In general, the results show that simpler similarity-based methods provide better performance for Hindi QA than the traditional machine learning method. In the future, we can explore transformer-based models to build the Hindi QA system and improve the robustness in real-world applications.

Declaration on Generative AI During the preparation of this work, the author(s) used ChatGPT, QuillBot in order to: Spelling Check, Paraphrase and reword. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the publication’s content. 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Association for Computational Linguistics, 2020, pp. 7315–7330. [15] J. H. Clark, J. Palomaki, V. Nikolaev, E. Choi, D. Garrette, M. Collins, T. Kwiatkowski, Tydi QA: A benchmark for information-seeking question answering in typologically diverse languages, Trans.

Assoc. Comput. Linguistics 8 (2020) 454–470. [16] J. Hu, S. Ruder, A. Siddhant, G. Neubig, O. Firat, M. Johnson, XTREME: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation, in: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, PMLR, 2020, pp. 4411–4421. [17] R. Puri, R. Spring, M. Shoeybi, M. Patwary, B. Catanzaro, Training question answering models from synthetic data, in: B. Webber, T. Cohn, Y. He, Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, Association for Computational Linguistics, 2020, pp. 5811–5826. [18] S. Endait, R. Ghatage, A. Kulkarni, R. Patil, R. Joshi, Indicsquad: A comprehensive multilingual question answering dataset for indic languages, CoRR abs/2505.03688 (2025). [19] S. Khanuja, D. Bansal, S. Mehtani, S. Khosla, A. Dey, B. Gopalan, D. K. Margam, P. Aggarwal, R. T.

Nagipogu, S. Dave, S. Gupta, S. C. B. Gali, V. Subramanian, P. P. Talukdar, Muril: Multilingual representations for indian languages, CoRR abs/2103.10730 (2021). [20] A. K. Singh, V. Kumar, R. Murthy, J. Sen, A. R. Mittal, G. Ramakrishnan, INDIC QA BENCHMARK: A multilingual benchmark to evaluate question answering capability of llms for indic languages, in: L. Chiruzzo, A. Ritter, L. Wang (Eds.), Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29 - May 4, 2025, Association for Computational Linguistics, 2025, pp. 2607–2626. [21] S. Doddapaneni, R. Aralikatte, G. Ramesh, S. Goyal, M. M. Khapra, A. Kunchukuttan, P. Kumar, Towards leaving no indic language behind: Building monolingual corpora, benchmark and models for indic languages, in: A. Rogers, J. L. Boyd-Graber, N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, Association for Computational Linguistics, 2023, pp. 12402–12426. [22] S. Wu, M. Dredze, Beto, bentz, becas: The surprising cross-lingual efectiveness of BERT, in: K. Inui, J. Jiang, V. Ng, X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Association for Computational Linguistics, 2019, pp. 833–844. [23] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at scale, in: D. Jurafsky, J. Chai, N. Schluter, J. R. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Association for Computational Linguistics, 2020, pp. 8440–8451. [24] G. Ramesh, S. Doddapaneni, A. Bheemaraj, M. Jobanputra, R. AK, A. Sharma, S. Sahoo, H. Diddee, M. J, D. Kakwani, N. Kumar, A. Pradeep, S. Nagaraj, D. Kumar, V. Raghavan, A. Kunchukuttan, P. Kumar, M. S. Khapra, Samanantar: The largest publicly available parallel corpora collection for 11 indic languages, Trans. Assoc. Comput. Linguistics 10 (2022) 145–162. [25] P. Rajpurkar, J. Zhang, K. Lopyrev, P. Liang, Squad: 100, 000+ questions for machine comprehension of text, in: J. Su, X. Carreras, K. Duh (Eds.), Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, The Association for Computational Linguistics, 2016, pp. 2383–2392. [26] A. Das, D. Ganguly, U. Garain, Named entity recognition with word embeddings and wikipedia categories for a low-resource language, ACM Trans. Asian Low Resour. Lang. Inf. Process. 16 (2017) 18:1–18:19. [27] T. P. Adewumi, F. Liwicki, M. Liwicki, Word2vec: Optimal hyper-parameters and their impact on

NLP downstream tasks, CoRR abs/2003.11645 (2020). [28] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information,

Trans. Assoc. Comput. Linguistics 5 (2017) 135–146. [29] T. Mikolov, K. Chen, G. Corrado, J. Dean, Eficient estimation of word representations in vector space, in: Y. Bengio, Y. LeCun (Eds.), 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, 2013.

[1]

Liaw ,

Huang , An investigation of user attitudes toward search engines as an information retrieval tool , Comput. Hum. Behav . 19 ( 2003 ) 751 - 765 .

[2]

C. V.

Gysel , Modeling spoken information queries for virtual assistants: Open problems, challenges and opportunities , in: H. Chen , W. E.

Duh , H.

Huang , M. P.

Kato , J.

Mothe , B. Poblete (Eds.), Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , SIGIR 2023 , Taipei, Taiwan, July 23-27 , 2023 , ACM, 2023 , pp. 3335 - 3338 .

[3]

Gatla , Anushka,

Kanwar , G. Sahoo,

R. K.

Mundotiya , Tourism question answer system in indian language using domain-adapted foundation models, arXiv preprint ( 2025 ).

[4]

R. F.

Simmons , Answering english questions by computer: a survey , Commun. ACM 8 ( 1965 ) 53 - 70 .

[5]

Kolomiyets ,

Moens , A survey on question answering technology from an information retrieval perspective, Inf . Sci. 181 ( 2011 ) 5412 - 5434 .

[6]

Nanda ,

Dua ,

Singla , A hindi question answering system using machine learning approach , in: 2016 International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT) , 2016 , pp. 311 - 314 .

[7]

Hermjakob , Parsing and question classification for question answering , in: Proceedings of the ACL 2001 Workshop on Open-Domain Question Answering , 2001 .

[8]

Zhang ,

W. S.

Lee , Question classification using support vector machines , in: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval , SIGIR '03, Association for Computing Machinery, New York, NY, USA, 2003 , p. 26 - 32 .

[9]

Abdel-Nabi ,

Awajan ,

M. Z.

Ali , Deep learning-based question answering: a survey , Knowledge and Information Systems 65 ( 2023 ) 1399 - 1485 .

[10]

Tan , C. dos Santos ,

Xiang ,

Zhou , Lstm-based deep learning models for non-factoid answer selection , 2016 .

[11]

Tay ,

M. C.

Phan ,

L. A.

Tuan ,

S. C.

Hui , Learning to rank question answer pairs with holographic dual lstm architecture , in: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '17 , ACM , 2017 , p. 695 - 704 .

[12]

Peng ,

Liu , Attention-based neural network for short-text question answering , in: Proceedings of the 2018 2nd International Conference on Deep Learning Technologies, ICDLT '18 , Association for Computing Machinery, New York, NY, USA, 2018 , p. 21 - 26 .

[13]

Artetxe ,

Ruder ,

Yogatama , On the cross-lingual transferability of monolingual representations , in: D. Jurafsky , J.

Chai , N.

Schluter , J. R.

Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5 - 10 , 2020 , Association for Computational Linguistics, 2020 , pp. 4623 - 4637 .

[14]

Lewis ,

Oguz ,

Rinott ,

Riedel ,

Schwenk , MLQA: evaluating cross-lingual extractive question answering , in: D. Jurafsky , J.

Chai , N.

Schluter , J. R.

Tetreault (Eds.), Proceedings of the