=Paper= {{Paper |id=Vol-256/paper-1 |storemode=property |title=Astronomical Databases Challenges |pdfUrl=https://ceur-ws.org/Vol-256/invited_1.pdf |volume=Vol-256 |dblpUrl=https://dblp.org/rec/conf/syrcodis/Bartunov07 }} ==Astronomical Databases Challenges== https://ceur-ws.org/Vol-256/invited_1.pdf
                       Astronomical Databases Challenges♣

                                                © Oleg Bartunov
                          Sternberg Astronomical Institute, Moscow University
                                 PostreSQL Global Development Group

                                                oleg@sai.msu.su

                                                                Astronomical data are not static, the scale and rate
Annotation                                                  of changes are different. We need version management
                                                            to be able to reproduce scientific results. Current
Modern astronomy undergoes a big change due to a            practice is to work with monolithic releases for big
new possibilities enabled by technology development.        catalogs, but there are many rapidly changed catalogs
Large scale survey projects produce huge amount of          which require version management on the row level.
data, which needs to be processed and organized in              SAI RVO development group was organized in
databases to provide access by astronomical                 Sternberg Astronomical Institute, Moscow University,
community. There are many problems in the current           in summer 2005, to meet the requirements of modern
state of art of accessing large astronomical databases      astronomy to develop unified access to astronomical
and organizing programmatic access to the distributed       data using generally adopted standards. Primary goal of
and diverse data, which Virtual Observatory initiative      the group is to develop fully functional node of Virtual
should eventually to solve. One of the most important       Observatory in Russia and facilitate solution of typical
problem is the effective execution of queries in the very   astrophysical problems using VO technology.
big databases. We expect petabyte databases in the 5            We realized original spatial algorithm for 2-
years from several big projects, like Large Synoptic        dimensional data with spherical attributes in open-
Survey Telescope (lsst.org), PAN-STARRS (pan-               source database PostgreSQL, which allow us to work
starrs.ifa.hawaii.edu), which planned to produce            with several terabytes databases. Our sky-indexing
Petabyte/year data size.                                    scheme Q3C is available for download from
    Nowadays, it's not unusual to work with billions of     q3c.sourceforge.net. The total number of objects in our
objects in terabyte-sized database. Astronomy is the        database is about 4 billion (10^9) objects. Our
only science which has so many objects. These objects       hardware, which is HP rx1620 entry-level server, dual
are intrinsically 2-dimensional, and what is more, they     Itanium2, 8Gb RAM and MSA 20 storage, was kindly
are located on the celestial sphere, which makes even       provided by HP Russia. We provide conesearch and
basic queries like find objects near some point with        crossmatch query via standard web-based interface for
fixed radius difficult and crossmath queries using          interactive work and webservices for programmatic
standard algorithms useless. The challenge is to provide    access (vo.astronet.ru) . We developed uniform access
execution time about several seconds for easy queries       to the diversed catalogs with the help of metadata
like spatial query and several minutes for catalogs         catalog.
crossmatch.                                                     We're working on developing of VO registry - a
    Huge databases change patterns of data access - it's    searchable directory of VO services, with additional
impossible to download data and do science locally.         full-text search of astronomical papers archive
Users will query databases via VO (Virtual                  (arxiv.org) to find information about astronomical
Observatory) services, so we need flexible access policy    objects. We developed full-text search engine in
to the system resources (disk, memory, cpu usage) and       PostgreSQL, which supports online index update and
handle users quotas in databases.                           users pluggable methods for document parsing and
    Clustering algorithms, which tend to be N^2 or N^3      lexemes processing.
complex, are need to improved to be applicable for              We participate in the creation of scalable data
petabyte databases.                                         processing and storage data center of Moscow
                                                            University. We'll use data center to store scans of SAI
♣
   The SAI RVO project is being developed in the            Glass Library - photos of sky for more than hundred
framework of the Astronet project, supported by RFBR        years. The largest plate is 30x30 cm with scan size
(Russian Foundation for Basic Research), grant 05-07-       about 4Gb. There are about 60,000 plates of different
90225.                                                      sizes and we estimate the total size in about 20 Tb.
Proceedings of the Spring Young Researcher's                Images will be accessed using SIAP (Simple Image
Colloquium On Database and Information Systems              Access Protocol), all image metadata will be stored in
                                                            PostgreSQL and indexed using our Q3C sky-indexing
SYRCoDIS, St.-Petersburg, Russia, 2007
                                                            scheme.