<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Database and Workflow Optimizations for Spatial-Geometric Queries in GeoMine</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Martin</forename><surname>Poppinga</surname></persName>
							<email>martin.poppinga@uni-hamburg.de</email>
							<affiliation key="aff0">
								<orgName type="department">Fachbereich Informatik</orgName>
								<orgName type="institution">Universität Hamburg</orgName>
								<address>
									<postCode>22527</postCode>
									<settlement>Hamburg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">ZBH -Center for Bioinformatics</orgName>
								<orgName type="institution">Universität Hamburg</orgName>
								<address>
									<postCode>20146</postCode>
									<settlement>Hamburg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Joel</forename><surname>Graef</surname></persName>
							<email>graef@zbh.uni-hamburg.de</email>
							<affiliation key="aff1">
								<orgName type="department">ZBH -Center for Bioinformatics</orgName>
								<orgName type="institution">Universität Hamburg</orgName>
								<address>
									<postCode>20146</postCode>
									<settlement>Hamburg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Konrad</forename><surname>Diedrich</surname></persName>
							<email>diedrich@zbh.uni-hamburg.de</email>
							<affiliation key="aff1">
								<orgName type="department">ZBH -Center for Bioinformatics</orgName>
								<orgName type="institution">Universität Hamburg</orgName>
								<address>
									<postCode>20146</postCode>
									<settlement>Hamburg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Matthias</forename><surname>Rarey</surname></persName>
							<email>rarey@zbh.uni-hamburg.de</email>
							<affiliation key="aff1">
								<orgName type="department">ZBH -Center for Bioinformatics</orgName>
								<orgName type="institution">Universität Hamburg</orgName>
								<address>
									<postCode>20146</postCode>
									<settlement>Hamburg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Norbert</forename><surname>Ritter</surname></persName>
							<email>norbert.ritter@uni-hamburg.de</email>
							<affiliation key="aff0">
								<orgName type="department">Fachbereich Informatik</orgName>
								<orgName type="institution">Universität Hamburg</orgName>
								<address>
									<postCode>22527</postCode>
									<settlement>Hamburg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Database and Workflow Optimizations for Spatial-Geometric Queries in GeoMine</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">9EE112CB474391DA49A7FFF186DFF6FA</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:20+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Database optimization, query optimization, data management, databases for bioinformatics (N. Ritter) 0000-0001-8529-8376 (M. Poppinga)</term>
					<term>0000-0001-8327-4936 (J. Graef)</term>
					<term>0000-0001-8171-0888 (K. Diedrich)</term>
					<term>0000-0002-9553-6531 (M. Rarey)</term>
					<term>0000-0002-1502-1395 (N. Ritter)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Addressing computational problems in science often involves customized algorithmic approaches, which can lead to overlooking well-established solutions in data management and storage. When scientific datasets grow, these customized approaches may struggle to query data efficiently. Effective data management is essential for ensuring accurate and fast analysis of scientific data. Describing changes in the GeoMine software, this paper highlights the potential for improvements in data-driven science.</p><p>GeoMine enables spatial-geometric searches in three-dimensional molecular space, facilitating tasks such as pharmaceutical drug discovery by finding similar geometric patterns in protein-ligand complexes. The original GeoMine application utilized a relational database solely for fundamental data storage and combined it with a tailored algorithmic pattern-matching strategy, leaving room for improvements. This work presents a technical overview of database and workflow optimizations in GeoMine to handle the increasing data size. Our improvements focus on moving the main computational tasks from the application level to the database system and optimizing the database utilization. A new query design, better utilization of indexes, and optimizations in textual queries led to a 15x speedup in our experiments, reducing the mean runtime of queries to under 8 seconds.</p><p>The presented improvements are essential for GeoMine to be offered as a service-oriented web application. The success of these improvements highlights the significance of database optimization in science, demonstrating the potential and necessity of proper data management.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Mining huge datasets is a central task in research. Analyzing molecular interactions between proteins and small organic molecules is essential for understanding disease treatments and advancing medical research. This includes searching for spatial similarities and geometric arrangements, which can provide vital insights into the functional aspects of proteins. Results can be used for further research, for example, in pharmaceutical drug discovery or biotechnology <ref type="bibr" target="#b0">[1]</ref>. With the growth of accessible datasets, searching for patterns in this data becomes increasingly challenging <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>. Besides the continuous growth of available experimental data, machine-learning-based structure predictions add millions of new structural models <ref type="bibr" target="#b3">[4]</ref>.</p><p>GeoMine <ref type="bibr" target="#b2">[3]</ref> is an application enabling a visual-guided geometric pattern search of molecular data in three-dimensional space. It is embedded in the proteins.plus<ref type="foot" target="#foot_0">1</ref> server <ref type="bibr" target="#b4">[5]</ref>, a collection of different web-based tools for various tasks in protein-based research. The server is a free service based on publicly available datasets handling over half a million page requests per year. The back end of GeoMine was derived in prior work from the PELIKAN application developed in the same group <ref type="bibr" target="#b5">[6]</ref>, which was utilizing a custom algorithmic approach for query processing. With the Protein Data Bank (PDB) <ref type="bibr" target="#b6">[7]</ref> as a fast-growing dataset underlying GeoMine and the shift from a desktop application to a server-based approach, GeoMine required an overhaul of the original query workflow to maintain the ability to provide results in a fast manner.</p><p>With this work, we investigate the potential of adopting a database-driven architecture, focusing on the database as the main part of query execution and reducing application-side processing. We were able to reduce the mean runtime in our experiments from about 2 minutes per query to less than 8 seconds, utilizing changes in the workflow and database optimizations. As we present in this work, a substantial performance enhancement has been achieved by shifting to a more database-centric method.</p><p>The paper is organized as follows: Section 2 provides an overview of the field of work, the data structure, and the query design; Section 3 details the improvements made to the query workflow and database optimizations; Section 4 presents and discusses the experimental results; Section 5 concludes the paper and outlines future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background and Related Work</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Data Management and Storage</head><p>Data management in scientific research involves the systematic collection, organization, storage, and sharing of data to facilitate its reusability and ensure the reproducibility of research findings. In the context of our work, which focuses on querying structured data sets, the storage aspect is particularly important. In the scientific domain, many existing applications are designed for single-user usage, often locally storing data in various formats or utilizing object stores with limited retrieval possibilities <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9]</ref>. For structured data, Relational Database Management Systems (RDBMS) are the most commonly used systems, providing robust and efficient solutions. Commonly, embedded systems are used, such as SQLite <ref type="bibr" target="#b9">[10]</ref> for applications with smaller or medium-sized data sizes or DuckDB <ref type="bibr" target="#b8">[9]</ref> for analytical workloads. For Online Transaction Processing (OLTP) workloads which require fast query performance and regular updates, server-based RDBMS are a popular choice. Large analytical queries are often served by designated Online Analytical Processing (OLAP) systems such as data warehouses, which are often proprietary solutions. For handling large-scale semi-structured datasets, NoSQL systems are frequently used, with columnar and graph databases being popular for analytical queries. The choice of data management and storage solutions is crucial to ensure efficient processing, reduced resource consumption, and accurate and fast analysis of scientific data. <ref type="bibr" target="#b10">[11]</ref>, a robust and widely accessible open-source database management system. As multiple users can access a web-based application such as GeoMine at the same time, the ability of a client-server-based database system to handle multiple queries efficiently in parallel is required. PostgreSQL's widespread adoption <ref type="bibr" target="#b11">[12]</ref> enables cloudagnostic hosting on every major platform since most cloud platforms offer PostgreSQL solutions or other PostgreSQL-compatible scalable databases. Additionally, setting up on-premise or local instances is straightforward. PostgreSQL is suited for OLTP and also OLAP workloads <ref type="bibr" target="#b12">[13]</ref>. The required workloads here can be depicted in the area of OLAP, given the potential complexity of the designed queries. However, given the use case of an interactive search mask for a web service, fast responses are a requirement. PostgreSQL's efficient query planning and extensibility for additional approaches (e.g., PostGIS <ref type="bibr" target="#b13">[14]</ref> for spatial data or Citus <ref type="bibr" target="#b14">[15]</ref> for distributed and columnar storage) make it a suitable foundation for GeoMine's use case.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>PostgreSQL GeoMine utilizes PostgreSQL</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Protein-Ligand Interactions and Binding Pockets</head><p>Protein-ligand interactions are of particular interest in biomolecular and pharmaceutical research. Ligands are small molecules that can interact and bind to the generally much larger proteins. Protein complexes can contain multiple pockets of varying sizes, partly containing ligands. Drug molecules used as pharmaceuticals are generally designed to target specific proteins. Researchers can gain valuable insights by investigating specific three-dimensional structures and searching for potential candidates to bind with these proteins.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Protein Data Bank</head><p>The PDB <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b1">2]</ref>, established in 1971, is a comprehensive repository of 3D structural data of proteins and nucleic acids. The structural information is primarily obtained through experimental methods, predominantly X-ray crystallography, from research facilities worldwide <ref type="bibr" target="#b1">[2]</ref>. As a freely available resource, the PDB has become vital for research in various fields by providing atomic-scale structural insights for drug design and understanding biological processes, containing more than 200,000 structures as of April 2023. Further, with the advantage of Computed Structure Models, which are protein structure predictions, for example, by AlphaFold2 <ref type="bibr" target="#b3">[4]</ref>, additional datasets with about 1,000,000 structures are available now <ref type="bibr" target="#b1">[2]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">GeoMine</head><p>Discovering similar structures across distinct complexes or finding molecules that bind to a specific pocket of interest is a major task in medical research. GeoMine is able to construct comprehensive databases derived from the PDB and supports exploring these databases with a web-based search interface. <ref type="bibr" target="#b2">[3]</ref> The preprocessing and database creation procedures employ components of the NAOMI library <ref type="bibr" target="#b15">[16]</ref>. For example, pockets are classified in a complex preprocessing pipeline when constructing the database <ref type="bibr" target="#b2">[3]</ref>. Central components are the DoGSite algorithm <ref type="bibr" target="#b16">[17]</ref>, which identifies empty binding pockets within protein structures, and the calculation of interactions <ref type="bibr" target="#b17">[18]</ref>.</p><p>The central part of the search and unique key feature is the ability to specify geometric properties, for instance, distances and angles between any points, such as atoms. Further, point properties can be specified, such as an atom's chemical element and interactions between points. This way, precise structural motifs (structural patterns) in protein-ligand complexes can be searched. While GeoMine's predecessor PELIKAN was a single-user application based on an integrated SQLite <ref type="bibr" target="#b9">[10]</ref> database, the GeoMine back end is aimed at a server-focused architecture. In the initial development of GeoMine <ref type="bibr" target="#b2">[3]</ref>, the query execution capabilities of PELIKAN were extended for new functionality but were not changed in structure to adapt to the new architecture.</p><p>Database Design For our experiments in Section 4, we used a PostgreSQL15 database created with the PDB dataset from October 2022. For querying the dataset, the database can be considered read-only. The database requires approximately 165GB of disk space.</p><p>For the geometric search, we focus on two tables. The first table, the point table, comprises all atoms and other definable points, such as the center of aromatic rings. It contains 340,716,693 searchable entries. These points are distributed across 1,382,853 distinct pockets, which serve as containers for groups of points. The largest pocket identified in our dataset contains 20,306 points, while the smallest pocket only holds 9 points. Each entry in the point table has a unique identifier, references the containing pocket, and contains various other fields with properties per point. Some properties, such as the accessible surface area of an atom, are floating point numbers. Other attributes, such as the chemical element, contain only a few distinct values, represented as integers or short strings.</p><p>The second table, the interaction table, stores pre-calculated interactions <ref type="bibr" target="#b17">[18]</ref>. These interactions represent noteworthy connections between two points, for example, hydrogen bonds. 13,018,225 point pairs are stored here.</p><p>Query Creation When creating a query, users can specify multiple constraints. The most fundamental categories encompass Textual and Numerical Searches, wherein metadata filters at the protein structure or pocket level can be defined. Users can directly pre-select several structures or create various filters, such as the minimum number of particular chemical elements or a certain molecular weight range for the ligand. It also enables filtering using patterns that describe a local environment using the chemical substructure language SMARTS strings <ref type="bibr" target="#b18">[19]</ref>.</p><p>The central search element and origin of GeoMine's name are geometry-based searches. To build the query, users may interactively select points in the web front end <ref type="bibr" target="#b20">[21]</ref> (see Figure <ref type="figure" target="#fig_0">1</ref>), utilizing an arbitrary PDB file as a template structure or define them without a template.</p><p>Users may select an arbitrary number of points, which can be filtered based on different properties. Moreover, the specification of distance ranges between two points and angles between specified distances is possible. Further, interactions between points, as stored in the interaction table, can be added to the query. Together they resemble an atomic substructure, which will be searched for. Each pocket can be examined individually as the interactions between one ligand and an individual pocket in a protein are of interest.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Query Execution</head><p>The initial approach for query execution was first described for the predecessor tool PELIKAN by Inhester et al. <ref type="bibr" target="#b5">[6]</ref>. The most significant enhancement for the runtime in developing the original GeoMine approach -utilizing a PostgreSQL database instead of SQLite -did not change the workflow of the searching process. The approach remained mostly algorithmic focused, with all major computational steps performed within the application (see Figure <ref type="figure" target="#fig_2">2a</ref>), as the original PELIKAN software was designed to be a standalone desktop application. In the original approach of GeoMine <ref type="bibr" target="#b2">[3]</ref>, four major steps were performed strictly sequentially for each query to filter the potential results:</p><p>1. Textual and Numerical Constraints -A filter eliminates all proteins and pockets that do not meet specified properties or do not correspond to a given restrictive SMARTS filter. This step yields a list of all matching proteins and their pockets. 2. Obtaining all point pairs -For each point pair in the query, all possible results are returned, and distances, as well as interaction constraints, are checked. 3. Clique detection -An algorithm reconstructs the coherent component graph for all obtained point pairs and checks all defined angle constraints. 4. Less restrictive SMARTS filters for points were applied to the now-generated results.</p><p>Steps two and three of the query processing presented particular challenges. All point and point pair constraints were queried individually in the database. Since a single constraint for a point pair is often not very specific, it leads to big intermediate results. Only by chaining several constraints the number of points is sufficiently reduced. The need to cross-verify each point with all matching points in its pocket demanded significant computational resources, especially if the filter for the points were unspecific. The list of potential pockets needed to be recreated for each pair, as only pockets which contained results in prior pair subqueries remained in the search space. This caused the search to be strictly sequential and required the serialization and deserialization of long pocket-ID lists for the SQL WHERE clauses. As the application and database system are separate processes or running on separate servers, the required repeated transfer of these lists also affected the performance. Because some point-to-point constraints were specific (less frequent in the dataset) and others were unspecific (frequent in the dataset), a hand-crafted scoring function was utilized to estimate the best ordering of queries, starting  with the most specific queries to reduce the search space early <ref type="bibr" target="#b5">[6]</ref>. Although this improved the join order in many cases, it had the disadvantage of preventing the database system from executing classical optimizations, such as parallelism and join order optimization.</p><p>Further, an additional algorithm was required since the results from the preceding steps consisted only of point pairs. The Bron-Kerbosch algorithm <ref type="bibr" target="#b21">[22]</ref>, a graph-based backtracking algorithm for clique detection, was used. This algorithm recursively verified whether all discovered point pairs constituted a complete graph and checked for angle constraints. This demanded substantial computational effort, taking several hours on large potential result sets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Optimizations</head><p>This research aims to achieve optimal performance and ease of setup across various environments. Alongside the contributions of this work, the application has transitioned to a containerized setup for cloud environments. The optimizations presented in this work are essential for facilitating the deployment of a scalable application. In this section, we will distinguish between the original approach in GeoMine <ref type="bibr" target="#b2">[3]</ref> and the improved approach we present in this work. The yielded results for each query remained identical.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Optimizing SQL Queries</head><p>The most significant change from the original approach was the redesign of the SQL query generation. Sequential processing of each constraint within a query led to severely limited query-level parallelism and long processing times as described in Section 2.3. Therefore, all SQL queries are now designed to make use of PostgreSQL's internal planning and optimization. In contrast to the original approach, where each point-to-point constraint was queried separately, a single comprehensive query containing all attributes and constraints for geometrical patterns is now constructed, see Figure <ref type="figure" target="#fig_2">2b</ref>. This reduces overhead by eliminating the need to repeatedly serialize extensive lists of pocket IDs or create temporary tables. To achieve this, the point table joins itself as often as points were specified in the query, usually 5-15 times. As a match occurs inside a single pocket, we only need to join points within the same pocket. With information about the distribution of properties like the chemical element, the RDBMS can estimate which part of the query restricts the search space the most and improve the join order. The original approach required running the checks on all points within all remaining pockets, not being able to skip points that were not matched in earlier subqueries. Intermediate results now remain within the database system and do not require serialization for application transfer. Additionally, merging all constraints (points, distances, and interactions) into one query eliminates the need for clique detection, as the output of the RDBMS is a connected and valid result.</p><p>Among all the geometric properties, only the angle checking between point pairs remains a separate step in the application, as this increases the complexity of the query without showing the benefits of an early reduced search space in our tests. Textual and numerical filters remain in a separate query to allow prior filtering, as SMARTS patterns require in-application processing. Allowing the RDBMS to determine the join order and the parallel execution resulted in a significant speedup of benchmark queries. The results are detailed in Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Enhanced Utilization of PostgreSQL Indexes</head><p>In the original approach, a single extensive index structure was created, covering 15 out of 17 table columns. Although PostgreSQL allows for the construction of multi-column indexes with a large number of attributes, these structures are only effective in certain situations due to their size and depending on the used attributes. However, using multiple single-column indexes and allowing PostgreSQL to combine them as recommended in the documentation <ref type="bibr" target="#b22">[23]</ref> did not achieve the desired performance improvement.</p><p>Only the combination of several attributes could substantially reduce the number of yielded points. The best-found solution for our workload was a balanced compromise between index size and utilization, including only the most frequently used columns in a multi-column index. We identified two separate cases for index usage. Firstly, the earliest scheduled subquery focused solely on the attributes, disregarding their pocket, in cases without textual and numerical filters. Secondly, an index for subsequent subqueries was needed to filter for pocket IDs required for the join. In almost all instances, the optimizer determined to filter for the pocket ID in the second subquery. In some instances, a parallel index scan was performed. Filtering by the pocket ID reduced the search space best in these cases since the most restrictive subquery had already been executed as the first scheduled subquery. Therefore, we introduced a second index with the pocket identifier positioned first in the index. For both structures, we utilized PostgreSQL's default B-Tree index as other index structures seemed not beneficial in our tests. As pockets usually contain only a few hundred points, spatial indexes, like r-trees provided by PostGIS <ref type="bibr" target="#b13">[14]</ref>, did not provide the desired benefits. Filtering points and calculating all distances performed better in our tests than spatial operations due to the overhead of utilizing a spatial column. Index creation only needed a few minutes, but additional indexes for specific queries would no longer fit into the filesystem read cache and reduce performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Improving Text Search</head><p>The initial step of the workflow involves filtering structures based on textual and numerical attributes. These filters target various properties, the most important being the PDB identifiers used to select a pre-defined or user-defined subset of protein structures. A short alphanumeric code identifies each structure.</p><p>Previously, an SQL ILIKE (case insensitive match) statement with a wildcard match at the beginning and end of the string was executed to check for the desired properties. For the PDB codes, we could make two changes. We could discard the wildcards in the query unless explicitly desired, which enables the utilization of a search index. And as the codes are not case-sensitive, we can replace the ILIKE with a LIKE, allowing for a case-sensitive search and resulting in a substantial speedup, as demonstrated in Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Evaluation and Discussion</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Methods</head><p>To evaluate the impact of each modification suggested for GeoMine, several experiments were derived from the original GeoMine approach ex01 (see Table <ref type="table" target="#tab_0">1</ref>). Experiments ex02 to ex05 each contain only one of the improvements, ex06 contains all improvements, while experiments ex07 to ex10 contain all except one. This way, we show which change impacts the performance most, as different improvements benefit from each other.</p><p>For evaluating the performance across different workloads, we used a set of nine queries already used in previous work <ref type="bibr" target="#b2">[3]</ref>, designed to highlight available features, show examples for common applications and estimate the runtime of different patterns common in GeoMine practical applications. They emitted between 2 and 7117 results.</p><p>We used a PostgreSQL15 database system. All data was stored on an SSD. Unless otherwise specified, a dedicated server with 400GiB RAM and 80 Cores was used (PostgreSQL 128GB sharedbuffers, 16 parallel workers). Podman <ref type="bibr" target="#b23">[24]</ref> was used to deploy the system. Each experiment was repeated five times. The GeoMine application was executed on the same node as the PostgreSQL database. We configured PostgreSQL to utilize less memory than available, as GeoMine required a high amount of working memory for some workloads. Additionally, we conducted tests on commodity systems by employing two setups (small/medium) using virtual servers. Both setups stored data on SSDs and were equipped with 12 cores and 24GB RAM, resp. 18 cores and 48GB RAM.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Results</head><p>Figure <ref type="figure" target="#fig_3">3</ref> shows the mean runtime of the nine test queries for each experiment as depicted in Table <ref type="table" target="#tab_0">1</ref>. Each change led to better performance, with the highest performance gain occurring  when all changes were applied together. The required time for performing all nine queries decreased from 1033sec of the original approach (ex01) to 68sec with all improvements (ex06).</p><formula xml:id="formula_0">6XPRI0HDQ5XQWLPHSHU4XHU\LQ6HFRQGV H[ H[ H[ H[ H[ H[ H[ H[ H[ H[ ([SHULPHQW V V V V V V V V V V 4XHU\ ([SORLWLQJB8QXVXDOB</formula><p>The new query design (ex04) had the most substantial impact on performance, particularly visible in the long-running queries. Also, the transition from the ILIKE to the LIKE statement notably reduced runtime. The performance gain is most noticeable on the medium-running queries containing a long list of PDB IDs for a preselection. The experiments 02 and 03, the new index and no wildcards in the PDB ID selection showed only a small improvement. However, experiment 09, which contains all changes except the wildcard improvement, shows that it has an impact on the overall runtime, presumably benefiting from the switch to the LIKE statement. The changes in index structures showed less impact than expected, demonstrating that PostgreSQL can handle indexes with an inflated number of columns. However, the performance was drastically worse if no index was used or index structures did not combine multiple attributes. For instance, combining one index per attribute led to an increase of the sum of the mean runtimes from 68sec (ex06) to 134sec.</p><p>Unspecific Queries Some of the used queries include a protein filter to reduce the number of searched pockets. When removing these filters and searching the whole dataset, the original approach reached its set limits (needing more than 100GB RAM or 1h time) on some of these and other queries with less restrictive geometric filters. With the improved approach, some queries with extensive intermediate results could now be computed for the first time, often within minutes.</p><p>Alternative Setups As large database instances are not always accessible, for example, due to cost constraints in cloud environments, we also conducted our experiment on two smaller virtual servers. As shown in Figure <ref type="figure" target="#fig_5">4a</ref>, the performance gains were also visible on these smaller server instances. These tests were performed on shared hardware, so they can only show a general trend rather than precise comparative data. However, they demonstrate the feasibility  of processing on shared virtual servers. Additionally, we observed a substantial speedup while transitioning from PostgreSQL10 to PostgreSQL15 as displayed in Figure <ref type="figure" target="#fig_5">4b</ref>. Combined with our improvements, we achieved a speedup factor of 32.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion and Future Work</head><p>GeoMine is a unique application for geometric searches in large collections of protein-ligand complexes with high relevance for life-science research. We showed that it was possible to achieve a large speedup on our query processing by moving major parts of the processing from a custom-written logic inside the software to a PostgreSQL database system. Additionally, different approaches in database optimization contributed to further performance gain. Overall, these achievements are critical for the practical use of the system handling the growing dataset. Some queries could be executed for the first time on our setup due to these changes. In this work, we focused on optimizations of the database and query design. We demonstrated the substantial benefits of database optimizations in scientific applications, achieving a fifteen-fold speedup in GeoMine. Coupled with a halving of the runtime through the use of a newer PostgreSQL version, we managed to reduce the average runtime from minutes to seconds.</p><p>Looking ahead, we plan to explore additional database paradigms, such as distributed or column-based systems, and establish schema changes for further optimizations. The caching of intermediate results, as well as determining the join order by extended statistics or by utilizing machine learning, may potentially provide additional benefits. This way, we aim to achieve even better performance for searching scientific data with a service-oriented web service.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: GeoMine's three-dimensional view of a binding pocket based on the NGL viewer<ref type="bibr" target="#b19">[20]</ref>. Users can interactively select atoms and other points and specify distances and interactions between them to generate the query. Here, a pocket around a ligand (bold bonds) is shown, together with the surrounding atoms of the protein.</figDesc><graphic coords="5,140.97,89.91,165.00,124.53" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: The original and the improved processing workflow of GeoMine (Simplified) for a given search</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: The sum of mean runtimes in seconds for each experiment as described in Section 4.1. Each color represents one distinct query</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head></head><label></label><figDesc>PostgreSQL in Version 10 and 15</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Mean runtimes in alternative configurations of experiment 01 and 06</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Experiment overview. Showing enabled improvments between baseline ex01 and all improvments ex06</figDesc><table><row><cell>Improvement</cell><cell cols="6">ex01 ex02 ex03 ex04 ex05 ex06 ex07 ex08 ex09 ex10</cell></row><row><cell>Index Improvement</cell><cell>x</cell><cell>x</cell><cell>x</cell><cell>x</cell><cell>x</cell></row><row><cell>No Wildcards</cell><cell>x</cell><cell>x</cell><cell>x</cell><cell>x</cell><cell></cell><cell>x</cell></row><row><cell>New Query Design</cell><cell>x</cell><cell>x</cell><cell>x</cell><cell></cell><cell>x</cell><cell>x</cell></row><row><cell>No ILIKE</cell><cell>x</cell><cell>x</cell><cell></cell><cell>x</cell><cell>x</cell><cell>x</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://proteins.plus</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was supported by the German Federal Ministry of Education and Research as part of CompLS and de.NBI (031L0172 and 031L0105).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Protein-ligand interaction databases: advanced tools to mine activity data and interactions on a structural level</title>
		<author>
			<persName><forename type="first">T</forename><surname>Inhester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rarey</surname></persName>
		</author>
		<idno type="DOI">10.1002/wcms.1192</idno>
	</analytic>
	<monogr>
		<title level="j">WIREs Computational Molecular Science</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="562" to="575" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Burley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bhikadiya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bittrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">A</forename><surname>Craig</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Research</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Searching Geometric Patterns in Protein Binding Sites and Their Application to Data Mining in Protein Kinase Structures</title>
		<author>
			<persName><forename type="first">J</forename><surname>Graef</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ehrt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Diedrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Poppinga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ritter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rarey</surname></persName>
		</author>
		<idno type="DOI">10.1021/acs.jmedchem.1c01046</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Medicinal Chemistry</title>
		<imprint>
			<biblScope unit="volume">65</biblScope>
			<biblScope unit="page" from="1384" to="1395" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Highly accurate protein structure prediction with alphafold</title>
		<author>
			<persName><forename type="first">J</forename><surname>Jumper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Evans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pritzel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Green</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Figurnov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Ronneberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Tunyasuvunakool</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Žídek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Potapenko</surname></persName>
		</author>
		<idno type="DOI">10.1038/s41586-021-03819-2</idno>
	</analytic>
	<monogr>
		<title level="j">Nature</title>
		<imprint>
			<biblScope unit="volume">596</biblScope>
			<biblScope unit="page" from="583" to="589" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Proteins plus: interactive analysis of protein-ligand binding interfaces</title>
		<author>
			<persName><forename type="first">K</forename><surname>Schöning-Stierand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Diedrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fährrolfes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Flachsenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Meyder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Nittinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Steinegger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rarey</surname></persName>
		</author>
		<idno type="DOI">10.1093/nar/gkaa235</idno>
	</analytic>
	<monogr>
		<title level="j">Nucleic acids research</title>
		<imprint>
			<biblScope unit="volume">48</biblScope>
			<biblScope unit="page" from="W48" to="W53" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Mining of Interaction Geometries in Collections of Protein Structures</title>
		<author>
			<persName><forename type="first">T</forename><surname>Inhester</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
		<respStmt>
			<orgName>Universität Hamburg</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. thesis</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The Protein Data Bank</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">M</forename><surname>Berman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Westbrook</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Gilliland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">N</forename><surname>Bhat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Weissig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">N</forename><surname>Shindyalov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">E</forename><surname>Bourne</surname></persName>
		</author>
		<idno type="DOI">10.1093/nar/28.1.235</idno>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Research</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="page" from="235" to="242" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Data sharing, management, use, and reuse: Practices and perceptions of scientists worldwide</title>
		<author>
			<persName><forename type="first">C</forename><surname>Tenopir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">M</forename><surname>Rice</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Allard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Baird</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Borycz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Christian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Grant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Olendorf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Sandusky</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PloS one</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page">e0229003</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Data management for data science-towards embedded analytics</title>
		<author>
			<persName><forename type="first">M</forename><surname>Raasveldt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mühleisen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CIDR</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">D</forename><surname>Hipp</surname></persName>
		</author>
		<ptr target="https://www.sqlite.org/" />
		<title level="m">SQLite</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<ptr target="https://www.postgresql.org" />
		<title level="m">The PostgreSQL Global Development Group, Postgresql: The world&apos;s most advanced open source relational database</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<ptr target="https://db-engines.com/en/ranking" />
		<title level="m">Db-engines ranking</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
		<respStmt>
			<orgName>solid IT gmbh</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Database of the year: Postgres</title>
		<author>
			<persName><forename type="first">A</forename><surname>Conrad</surname></persName>
		</author>
		<idno type="DOI">10.1109/MS.2021.3089730</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Software</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="page" from="130" to="132" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<ptr target="https://postgis.net" />
		<title level="m">The PostGIS Development Group</title>
				<meeting><address><addrLine>Postgis</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Citus: Distributed postgresql for data-intensive applications</title>
		<author>
			<persName><forename type="first">U</forename><surname>Cubukcu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Erdogan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pathak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sannakkayala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Slot</surname></persName>
		</author>
		<idno type="DOI">10.1145/3448016.3457551</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 International Conference on Management of Data, SIGMOD &apos;</title>
				<meeting>the 2021 International Conference on Management of Data, SIGMOD &apos;</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="2490" to="2502" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Naomi: On the almost trivial task of reading molecules from different file formats</title>
		<author>
			<persName><forename type="first">S</forename><surname>Urbaczek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kolodzik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Fischer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lippert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Heuser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Groth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Schulz-Gasch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rarey</surname></persName>
		</author>
		<idno type="DOI">10.1021/ci200324e</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Chemical Information and Modeling</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="page" from="3199" to="3207" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Binding site detection remastered: Enabling fast, robust, and reliable binding site detection and descriptor calculation with dogsite3</title>
		<author>
			<persName><forename type="first">J</forename><surname>Graef</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ehrt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rarey</surname></persName>
		</author>
		<idno type="DOI">10.1021/acs.jcim.3c00336</idno>
		<idno>pMID: 37130052</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Chemical Information and Modeling</title>
		<imprint>
			<biblScope unit="volume">63</biblScope>
			<biblScope unit="page" from="3128" to="3137" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Index-based searching of interaction patterns in large collections of protein-ligand interfaces</title>
		<author>
			<persName><forename type="first">T</forename><surname>Inhester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bietz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hilbig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rarey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Chemical Information and Modeling</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="page" from="148" to="158" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Daylight Chemical Information Systems, Smarts-a language for describing molecular patterns</title>
		<author>
			<persName><forename type="first">I</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">NGL viewer: web-based molecular graphics for large complexes</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Rose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>Bradley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Valasatava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Duarte</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Prlić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">W</forename><surname>Rose</surname></persName>
		</author>
		<idno type="DOI">10.1093/bioinformatics/bty419</idno>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="3755" to="3758" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">GeoMine: interactive pattern mining of protein-ligand interfaces in the Protein Data Bank</title>
		<author>
			<persName><forename type="first">K</forename><surname>Diedrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Graef</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Schöning-Stierand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rarey</surname></persName>
		</author>
		<idno type="DOI">10.1093/bioinformatics/btaa693</idno>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="page" from="424" to="425" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Algorithm 457: finding all cliques of an undirected graph</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kerbosch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="575" to="577" />
			<date type="published" when="1973">1973</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<ptr target="https://www.postgresql.org/docs/15/" />
		<title level="m">PostgreSQL 15</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note>Documentation</note>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Podman</forename><surname>Containers</surname></persName>
		</author>
		<ptr target="https://podman.io/" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
