Development of an Intelligent Search Engine using GPT model for GrantsForScience platform Oleksandr M. Khimich1, Serhii V. Yershov1, Elena A. Nikolaevskaya1 and Pavlo S. Yershov1 1 V.M Glushkov Institute of Cybernetics of NAS of Ukraine, Academician Glushkov Avenue, 40, Kyiv, 03187, Ukraine Abstract Grants serve as a primary source of funding for many scientific research projects. The idea of developing a GrantsForScience platform using advanced AI technologies is proposed to simplify these processes and increase their efficiency. The concept, architecture, and implementation of a microservice designed for the intelligent search of scientific grants are delved in the paper. It highlights the limitations of traditional manual grant search methods and elucidates the benefits of an automated approach. The technical facets of the implementation, particularly the use of GPT for analysing scientific publications, are thoroughly discussed. Keywords Intelligent search, scientific grants, microservice, GPT, automation, parsing, API (Application Programming Interface), scientific publications 1 1. Introduction In the modern world, research and innovation projects play a key role in developing new technologies, improving the quality of life and solving global problems. At the same time, researchers often face significant difficulties in finding funding and partners to implement their projects. On the other hand, companies and investors are looking for opportunities to collaborate with scientific institutions to develop innovations and implement new technologies. Grants serve as a primary source of funding for many scientific research projects. They cover expenses for equipment, materials, researchers' salaries, and other costs associated with conducting research. Timely and efficient grant searching is critical for the successful execution of scientific projects. Securing grants ensures that researchers have the necessary resources to pursue innovative and impactful studies. Manual grant search involves browsing numerous websites, databases, and other sources of information. This process is time-consuming and often ineffective, as researchers may miss important opportunities due to the sheer volume of information and limited time for processing it. Additionally, manual search methods lack the ability to comprehensively analyze and cross-reference data from multiple sources, leading to potential oversights. Therefore, the authors came up with the idea of developing a platform GrantsForScience using advanced artificial intelligence technologies to simplify these processes and increase their efficiency. The publications [1]-[5] can safely be called some of the most important publications in the field of artificial intelligence and GPT models. They played a key role in the development of natural language processing technologies and led to the creation of powerful language models. The development of large language models has revolutionized natural language processing [6], [7], [8]. These models have shown great potential in solving various NLP natural language processing tasks, from natural language understanding (NLU) to generation tasks, even paving the way for artificial general intelligence (AGI). The research and practical implementations related to natural language processing (NLP) technologies based on the concept of artificial intelligence, generative AI and the ProfIT AI 2024: 4th International Workshop of IT-professionals on Artificial Intelligence (ProfIT AI 2024), September 25–27, 2024, Cambridge, MA, USA khimich505@gmail.com (O.M. Khimich); sershv@ukr.net (S.V. Yershov); elena_nea@ukr.net (E.A. Nikolaevskaya); yershov.pavel.wsk@gmail.com (P.S. Yershov) 0000-0002-8103-4223 (O.M. Khimich); 0000-0002-9895-777X (S.V. Yershov); 0000-0002-5145-0189 (E.A. Nikolaevskaya); 0000-0002-9072-7996 (P.S. Yershov) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Wor Pr ks hop oceedi ngs ht I tp: // ceur - SSN1613- ws .or 0073 g CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings concept of complex networks aimed at creating semantic networks are presented in the monograph [9]. There are currently many developments in the field of artificial intelligence. The most famous are, of course, the products of the OpenAI company [10], such as, GPT [11]. GPT is a natural language processing technology based on a transformer architecture that learns from a large amount of text data and is capable of generating high quality texts. These models are trained on huge data sets and learn to understand the syntax and semantics of the language, which, in particular, makes them powerful tools for building semantic networks and domain models. The ChatGPT model [12] is built on top of the OpenAI GPT-3 [13], GPT-3.5 [14] and GPT-4 [15] family of large language models. The fine tuning of the chatbot was performed using both supervised learning methods and reinforcement learning. Other notable projects using GPT include, among others: - GitHub Copilot [16] (using the OpenAI Codex model, a descendant of GPT-3, configured for code generation); - Copy.ai and Jasper.ai [17] (content generation for marketing purposes); - Algolia [18] (improving search engine capabilities), SearchGPT [10] (prototype of a new AI search features). Meta [19] recently opened access to its new model Llama 3.1 405B [20], which according to many tests surpasses such giants as GPT-4, Claude 3.5 Sonnet [21] and Google Gemini Pro [22]. Authors will plan to research and compare this new model it with GPT model. Now let the concept, architecture, implementation and the technical facets (particularly the use of GPT for analyzing scientific publications) of a microservice designed for the intelligent search of scientific grants more detail. 2. Intelligent search model Intelligent search employs artificial intelligence (AI) and machine learning (ML) techniques to analyze large volumes of data and find relevant information. In the context of scientific grants, intelligent search can automatically process information about researchers, their publications, and relevant scientific fields to identify the most suitable grants. This approach not only saves time but also enhances the accuracy of the search results by considering a wide range of parameters and data sources. Intelligent grant search offers several key advantages: • Automation: Reduces the time and effort required for grant searching, allowing researchers to focus on their core activities. • Intelligence: Utilizes AI to find the most relevant grants based on the analysis of scientific publications, thereby increasing the precision of search results. • Result Quality: Provides accurate and relevant search results by considering the specific research areas and interests of the scientist. • Scalability: The system can be expanded to search a large number of sources, accommodating the growing needs of the research community. • Continuous Updates: Keeps researchers informed about new funding opportunities as they become available, ensuring that they do not miss out on potential grants. 2.1. Architecture of the Microservice for Intelligent Search Microservice architecture involves developing individual services that perform specific tasks and can interact with each other via APIs. This modular approach allows for easy scaling of the system, adding new features, and maintaining high availability and fault tolerance. Microservices can be independently developed, deployed, and managed, which enhances the flexibility and resilience of the overall system. Grant searching is performed based on input data such as ScopusID [23], ORCID [24], first name, and last name of the researcher. These identifiers allow the system to gather comprehensive information about the researcher’s publications and scientific contributions, which are critical for matching the researcher with relevant grants. The search results (output data) are provided in a JSON response with a parameterized list of found grants. Each grant entry includes the title, description, link, source name, and metadata. This structured format facilitates easy integration with other systems and applications that the researchers might be using. The architecture of the microservice includes the following components (Figure1): • API Endpoints: a. GET /status: Health check of the service to ensure it is operational. b. POST /run_search: Initiates the grant search task based on the provided input data. c. GET /job_result/{{job_uuid}}: Retrieves the results of the grant search task using a unique job identifier. • Database: Stores information about users, their requests, and search results, ensuring data persistence and reliability. • Grant Search Mechanism: Comprises a parser, an analyzer, and a searcher, each responsible for specific tasks in the grant search process. Figure 1: Architecture diagram of intelligent search microservice Grant Search mechanism consists of: • Parser. The parser searches for titles and abstracts of the scientist's publications in various sources, such as Scopus. The result of the parser's work is a list of found publication titles. This step is crucial for gathering the necessary data to analyse the researcher's areas of expertise and interests. • Analyzer. The analyzer determines the parameters of the publications found in the previous step. It classifies the publications into three lists: research subjects, scientific fields, and research directions. This classification is essential for accurately matching the researcher with relevant grants. • Searcher. The searcher conducts the grant search in open sources, such as the EU Fundings & Tenders Portal [25]. It uses the parameters of the publications, determined by the analyzer, to find grants that align with the researcher’s work. This component ensures that the search results are highly relevant and tailored to the researcher’s needs. 2.2. Prototype of GPT model for GrantsForScience The prototype of the intelligent search microservice for scientific grants consists of limited key components that work together to facilitate the search and retrieval of relevant grant opportunities. Figure 2 represents key elements of intelligent search microservice, implemented within a prototype (highlighted by green and yellow colors). Figure 2: Architecture diagram of intelligent search microservice It consists of: • 3 API endpoints described above. • Parser using Scopus Search API [26]. • Analyzer. • Searcher using EU Fundings and Tenders Portal. 2.3. Usage of GPT model The analyzer uses GPT-3.5-turbo to generate a JSON with search parameters based on the titles of scientific articles. GPT-3.5-turbo was chosen for its optimal balance of cost and capabilities. Interaction with GPT occurs via HTTP API, with a temperature setting of 0.5, determined experimentally. This setting ensures a balance between creativity and coherence in the generated responses. The model has been trained on a diverse range of internet text, which allows it to handle various tasks, including text summarization, translation, and content generation. The temperature parameter in GPT controls the randomness of the output. A lower temperature (close to 0) makes the model's output more deterministic and focused, while a higher temperature (closer to 1) allows for more randomness and creativity. For the grant search analyzer, a temperature of 0.5 was chosen to maintain a balance between generating diverse responses and ensuring relevance to the input data. Analyzer utilises GPT API in assistance mode, asking the following message: "Fill in {{"science_branches": [], "research_areas": [], "research_subjects": []}} JSON based on a list of scientific articles names given: {list_of_names}" containing a list of article names provided. GPT returns JSON filled by values (Figure 3). Part of the program code, utilized to analyze article names using GPT is below: ... list_of_names = ", ".join(names) message = f'Fill in {{"science_branches": [], "research_areas": [], "research_subjects": []}} '\ f'JSON based on a list of scientific articles names given: {list_of_names}' result = self._ask_gpt(message) ... def _ask_gpt(self, message, temperature=0.5): logger.info(f"Asking GPT-3.5:\n {message}") try: gpt_result = self.client.chat.completions.create( model="gpt-3.5-turbo-0125", temperature=temperature, response_format={"type": "json_object"}, messages=[ {"role": "system", "content": "You are a helpful assistant designed to output JSON list of strings"}, {"role": "user", "content": message} ] ) except Exception as e: raise GPTAPIError(f"GPT API error: \n{e}") ......... 2.4. Software Implementation The microservice is implemented as a Docker-compose application [27] with the following containers: • web: Python 3.10, Flask [28], marshmallow, requests - API engine, service orchestration. • worker: Celery [29], Redis - Asynchronous execution of grant search tasks. • redis: Redis [30] - Database for storing intermediate results and task queues. • dashboard: Celery, flower - Task monitoring and debugging. Application is hosted at Digitalocean [31] and is IP-restricted. Postman [32] HTTP client is used to test API endpoints. 2.5. Prototype approbation Prototype was tested using real researcher profile data. In an example provided within this article we used ORCID and ScopusID of an author (Figure 3). Figure 3: Invocation of grant search job execution for a researcher profile data using Postman For a given researcher, a prototype returned 90 relevant grants found on EU Funding and Tenders Portal - 10 for each of 9 keywords found by Analyzer on a basis of 12 articles found within Scopus. Job result response contains results of Searcher - list of grants (“grants”) that match keywords defined by Analyser for articles found by Parser. Output of intermediate steps are included as well (“articles”, “article_keywords”). Response JSON is located below, repeating parts are shortened by “…”. { "async": true, "job_uuid": "90d2bda8-fb2e-4787-a934-f55b44fcda82", "result": { "articles": { "ScopusParserV1": [ { "authors": [ "Nikolaevskaya E.A." ], "date": "2009-11-01", "id": "2-s2.0-72449169485", "source": "ScopusParserV1", "summary": null, "title": "Program-algorithmic methods to improve the accuracy of computer solutions", "url": "https://api.elsevier.com/content/abstract/scopus_id/72449169485", "uuid": "33287c74-bb14-46a1-af98-7c96ceb3a19c", "year": "2009" }, …………………………………… ] }, "articles_keywords": { "research_areas": [ "Numerical Methods", "High Performance Computing", "Algorithms" ], "research_subjects": [ "Parallel Computing", "Numerical Linear Algebra", "Computational Mathematics" ], "science_branches": [ "Numerical Analysis", "Computer Science", "Mathematics" ] }, "career": { ……………………………………}, "errors": {}, "grants": { "EUCommission": { "Algorithms": [ { "amount": [ null ], "currency": null, "end_date": null, "identifier": "HOP_ON_PROJECT101080142", "match_by_key": "Algorithms", "meta": { "apiVersion": "2.120", "database": "SEDIA", "language": "en", "programmePeriod": null }, "source": "EuropeanCommissionGrantSearcher", "start_date": "2022-11-01T01:00:00.000+0100", "summary": "Efficient QUantum ALgorithms for IndusTrY", "title": "Efficient QUantum ALgorithms for IndusTrY", "url": "https://ec.europa.eu/info/funding- tenders/opportunities/horizon/hop-on/101080142", "uuid": "ca8e2e5a-8120-4abc-b264-3e0a84d62897" }, …………………………………… ] } }, "input_keywords": [], "person": { "first_name": "Олена", "last_name": "Ніколаєвська", "middle_name": "Анатоліївна", "orcid": "0000-0002-5145-0189", "scopus_id": "6503942582" } }, "status": "Completed successfully" } 2.6. Applied Usage The microservice is intended to be used as part of the infrastructure for a system that searches for grants from numerous open sources on the Internet. The system will: • Allow organizations and scientists to create accounts and fill out profiles. • Enable on-demand searches for grants based on the profiles. • Companies (organizations) will be able to post tasks and search for performers among registered users (scientists and organizations) • Send daily notifications about new grants in the sources, ensuring that researchers are always up-to-date with the latest opportunities. The system will feature a user-friendly interface where researchers can input their profile information and receive personalized grant recommendations. It will also provide dashboards for monitoring search results and managing profiles. The microservice can be integrated with other research management systems, allowing seamless data exchange and enhancing the overall efficiency of research administration. 3. Conclusions The principles, architecture and technologies for creating microservices for intelligent search using artificial intelligence technologies such as GPT are proposed, which allows creating a scalable, reliable and efficient search system GrantsForScience. Intelligent search of scientific grants significantly improves the efficiency and accuracy of research funding search. This approach not only saves researchers time and effort, but also ensures that they do not miss valuable funding opportunities. The next steps are to develop a comprehensive grant search system, namely, expanding the functionality to cover more sources and provide more detailed search results; scaling the search: adding more parsers and search engines to process more data and improve the accuracy of the search; caching mechanisms, namely, implementing caching to reduce the number of queries and speed up the search process; unit testing and technical updates (continuously improving the system through thorough testing and regular updates). References [1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is All You Need, NeurIPS (2017). doi:10.48550/arXiv.1706.03762. [2] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL-HLT (2019). doi:10.48550/arXiv.1810.04805. [3] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language Models are Few- Shot Learners, NeurIPS (2020). doi:10.48550/arXiv.2005.14165. [4] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language Models are Unsupervised Multitask Learners, OpenAI Blog (2019). URL: https://cdn.openai.com/better-language- models/language_models_are_unsupervised_multitask_learners.pdf. [5] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, Improving Language Understanding by Generative Pre-Training, OpenAI Blog (2018). URL: https://openai.com/research/language- understanding-generative-pre-training. [6] Bernard J. Jansen, Soon-gyo Jung, Joni Salminen. Employing large language models in survey research. Natural Language Processing Journal. Volume 4, September (2023). doi:10.1016/j.nlp.2023.100020 [7] Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023 [8] Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, Guangzhou Wang, Kai Zhang, Cheng Ji, Qiben Yan, Lifang He, et al. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419, 2023 [9] Dmytro Lande, Leonard Strashnoy. GPT Semantic Networking: A Dream of the Semantic Web – The Time is Now. – Kyiv: Engineering, 2023. – 168 p. ISBN 978-966-2344-94-3 [10] OpenAI, URL: https://openai.com/ [11] Aymen El Amri. The art and science of developing intelligent apps with OpenAI GPT-3, DALL·E 2, CLIP, and Whisper - Suitable for learners of all levels / Kindle Edition, 2023. – 378 p. [12] GPTChat, URL: https://openai.com/chatgpt/ [13] OpenAI GPT-3, URL: https://openai.com/index/gpt-3-apps/ [14] OpenAI GPT-3.5, URL: https://openai.com/index/gpt-3-5-turbo-fine-tuning-and-api-updates/ [15] OpenAI GPT-4, URL: https://openai.com/index/gpt-4/ [16] GitHub CopilotCopy.ai, URL: https://github.com/features/copilot [17] Jasper.ai, URL: https://www.jasper.ai/comparison/jasper-vs-chatgpt [18] Algolia: https://www.algolia.com/doc/ [19] Meta, URL: https://about.meta.com/company-info/ [20] Llama 3.1 405B, URL: https://ai.meta.com/blog/meta-llama-3-1/ [21] Claude 3.5 Sonnet, URL: https://www.anthropic.com/news/claude-3-5-sonnet [22] Google Gemini Pro, URL: https://deepmind.google/technologies/gemini/ [23] Scopus Database, URL: https://www.scopus.com/ [24] OrcID, URL: https://info.orcid.org/what-is-orcid/ [25] EU Fundings & Tenders Portal. Retrieved, URL: https://ec.europa.eu/info/funding- tenders/opportunities/portal [26] Scopus Search API, URL: https://dev.elsevier.com/documentation/ScopusSearchAPI.wadl [27] Docker Compose Documentation, URL: https://docs.docker.com/compose/ [28] Flask - Python Web Framework, URL: https://flask.palletsprojects.com/en/2.0.x/ [29] Celery - Distributed Task Queue, URL: https://docs.celeryproject.org/en/stable/ [30] Redis Documentation, URL: https://redis.io/about/ [31] DigitalOcean Hosting, URL: https://www.digitalocean.com/ [32] Postman HTTP Client, URL: https://www.postman.com/