Astronomical Databases Challenges♣ © Oleg Bartunov Sternberg Astronomical Institute, Moscow University PostreSQL Global Development Group oleg@sai.msu.su Astronomical data are not static, the scale and rate Annotation of changes are different. We need version management to be able to reproduce scientific results. Current Modern astronomy undergoes a big change due to a practice is to work with monolithic releases for big new possibilities enabled by technology development. catalogs, but there are many rapidly changed catalogs Large scale survey projects produce huge amount of which require version management on the row level. data, which needs to be processed and organized in SAI RVO development group was organized in databases to provide access by astronomical Sternberg Astronomical Institute, Moscow University, community. There are many problems in the current in summer 2005, to meet the requirements of modern state of art of accessing large astronomical databases astronomy to develop unified access to astronomical and organizing programmatic access to the distributed data using generally adopted standards. Primary goal of and diverse data, which Virtual Observatory initiative the group is to develop fully functional node of Virtual should eventually to solve. One of the most important Observatory in Russia and facilitate solution of typical problem is the effective execution of queries in the very astrophysical problems using VO technology. big databases. We expect petabyte databases in the 5 We realized original spatial algorithm for 2- years from several big projects, like Large Synoptic dimensional data with spherical attributes in open- Survey Telescope (lsst.org), PAN-STARRS (pan- source database PostgreSQL, which allow us to work starrs.ifa.hawaii.edu), which planned to produce with several terabytes databases. Our sky-indexing Petabyte/year data size. scheme Q3C is available for download from Nowadays, it's not unusual to work with billions of q3c.sourceforge.net. The total number of objects in our objects in terabyte-sized database. Astronomy is the database is about 4 billion (10^9) objects. Our only science which has so many objects. These objects hardware, which is HP rx1620 entry-level server, dual are intrinsically 2-dimensional, and what is more, they Itanium2, 8Gb RAM and MSA 20 storage, was kindly are located on the celestial sphere, which makes even provided by HP Russia. We provide conesearch and basic queries like find objects near some point with crossmatch query via standard web-based interface for fixed radius difficult and crossmath queries using interactive work and webservices for programmatic standard algorithms useless. The challenge is to provide access (vo.astronet.ru) . We developed uniform access execution time about several seconds for easy queries to the diversed catalogs with the help of metadata like spatial query and several minutes for catalogs catalog. crossmatch. We're working on developing of VO registry - a Huge databases change patterns of data access - it's searchable directory of VO services, with additional impossible to download data and do science locally. full-text search of astronomical papers archive Users will query databases via VO (Virtual (arxiv.org) to find information about astronomical Observatory) services, so we need flexible access policy objects. We developed full-text search engine in to the system resources (disk, memory, cpu usage) and PostgreSQL, which supports online index update and handle users quotas in databases. users pluggable methods for document parsing and Clustering algorithms, which tend to be N^2 or N^3 lexemes processing. complex, are need to improved to be applicable for We participate in the creation of scalable data petabyte databases. processing and storage data center of Moscow University. We'll use data center to store scans of SAI ♣ The SAI RVO project is being developed in the Glass Library - photos of sky for more than hundred framework of the Astronet project, supported by RFBR years. The largest plate is 30x30 cm with scan size (Russian Foundation for Basic Research), grant 05-07- about 4Gb. There are about 60,000 plates of different 90225. sizes and we estimate the total size in about 20 Tb. Proceedings of the Spring Young Researcher's Images will be accessed using SIAP (Simple Image Colloquium On Database and Information Systems Access Protocol), all image metadata will be stored in PostgreSQL and indexed using our Q3C sky-indexing SYRCoDIS, St.-Petersburg, Russia, 2007 scheme.