=Paper=
{{Paper
|id=Vol-256/paper-1
|storemode=property
|title=Astronomical Databases Challenges
|pdfUrl=https://ceur-ws.org/Vol-256/invited_1.pdf
|volume=Vol-256
|dblpUrl=https://dblp.org/rec/conf/syrcodis/Bartunov07
}}
==Astronomical Databases Challenges==
Astronomical Databases Challenges♣
© Oleg Bartunov
Sternberg Astronomical Institute, Moscow University
PostreSQL Global Development Group
oleg@sai.msu.su
Astronomical data are not static, the scale and rate
Annotation of changes are different. We need version management
to be able to reproduce scientific results. Current
Modern astronomy undergoes a big change due to a practice is to work with monolithic releases for big
new possibilities enabled by technology development. catalogs, but there are many rapidly changed catalogs
Large scale survey projects produce huge amount of which require version management on the row level.
data, which needs to be processed and organized in SAI RVO development group was organized in
databases to provide access by astronomical Sternberg Astronomical Institute, Moscow University,
community. There are many problems in the current in summer 2005, to meet the requirements of modern
state of art of accessing large astronomical databases astronomy to develop unified access to astronomical
and organizing programmatic access to the distributed data using generally adopted standards. Primary goal of
and diverse data, which Virtual Observatory initiative the group is to develop fully functional node of Virtual
should eventually to solve. One of the most important Observatory in Russia and facilitate solution of typical
problem is the effective execution of queries in the very astrophysical problems using VO technology.
big databases. We expect petabyte databases in the 5 We realized original spatial algorithm for 2-
years from several big projects, like Large Synoptic dimensional data with spherical attributes in open-
Survey Telescope (lsst.org), PAN-STARRS (pan- source database PostgreSQL, which allow us to work
starrs.ifa.hawaii.edu), which planned to produce with several terabytes databases. Our sky-indexing
Petabyte/year data size. scheme Q3C is available for download from
Nowadays, it's not unusual to work with billions of q3c.sourceforge.net. The total number of objects in our
objects in terabyte-sized database. Astronomy is the database is about 4 billion (10^9) objects. Our
only science which has so many objects. These objects hardware, which is HP rx1620 entry-level server, dual
are intrinsically 2-dimensional, and what is more, they Itanium2, 8Gb RAM and MSA 20 storage, was kindly
are located on the celestial sphere, which makes even provided by HP Russia. We provide conesearch and
basic queries like find objects near some point with crossmatch query via standard web-based interface for
fixed radius difficult and crossmath queries using interactive work and webservices for programmatic
standard algorithms useless. The challenge is to provide access (vo.astronet.ru) . We developed uniform access
execution time about several seconds for easy queries to the diversed catalogs with the help of metadata
like spatial query and several minutes for catalogs catalog.
crossmatch. We're working on developing of VO registry - a
Huge databases change patterns of data access - it's searchable directory of VO services, with additional
impossible to download data and do science locally. full-text search of astronomical papers archive
Users will query databases via VO (Virtual (arxiv.org) to find information about astronomical
Observatory) services, so we need flexible access policy objects. We developed full-text search engine in
to the system resources (disk, memory, cpu usage) and PostgreSQL, which supports online index update and
handle users quotas in databases. users pluggable methods for document parsing and
Clustering algorithms, which tend to be N^2 or N^3 lexemes processing.
complex, are need to improved to be applicable for We participate in the creation of scalable data
petabyte databases. processing and storage data center of Moscow
University. We'll use data center to store scans of SAI
♣
The SAI RVO project is being developed in the Glass Library - photos of sky for more than hundred
framework of the Astronet project, supported by RFBR years. The largest plate is 30x30 cm with scan size
(Russian Foundation for Basic Research), grant 05-07- about 4Gb. There are about 60,000 plates of different
90225. sizes and we estimate the total size in about 20 Tb.
Proceedings of the Spring Young Researcher's Images will be accessed using SIAP (Simple Image
Colloquium On Database and Information Systems Access Protocol), all image metadata will be stored in
PostgreSQL and indexed using our Q3C sky-indexing
SYRCoDIS, St.-Petersburg, Russia, 2007
scheme.