<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Clustering Stellar Pairs to Detect Extended Stellar Structures</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>pozhnikov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of astronomy RAS, Russia</institution>
          ,
          <addr-line>Moscow</addr-line>
        </aff>
      </contrib-group>
      <fpage>227</fpage>
      <lpage>234</lpage>
      <abstract>
        <p>Gaia data allows for search for extended stellar structures in phase (coordinates plus velocities) space. We describe a method of using DBSCAN clustering algorithm, which is used to group closelypacked-together data points, to a list of preliminary selected pairs of stars, with parameters expected to be found within stellar streams and comoving groups: loose structures in which stars are not gravitationally bound, but do share motion and evolutionary properties. To test our approach, we construct a model population of background stars, and use pair-constructing and clustering algorithms on it. Results show that transitioning to a list of pairs sharply reveals structures not presented in background model, which then become more apparent targets in coordinates-velocities phase space for DBSCAN algorithm thanks to now increased relative density of the extended stellar structure.</p>
      </abstract>
      <kwd-group>
        <kwd>Star clusters</kwd>
        <kwd>Stellar associations</kwd>
        <kwd>Data analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Astrometric space mission Gaia of European Space Agency provides data on
positions, proper motions, and parallaxes of stars with previously unseen precision
and scale (&gt; 1.5 billion sources) [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Significant quantitative improvement over
previous all-sky star surveys allow for qualitative improvement and opens new
possibilities for research. In particular, this data is used for the task of
determining the members of stellar clusters. High quality of data also allows to discern
extended low-density structures within the stellar background. Such extended
structures, comoving groups and stellar streams, form as a result of dissipation
of open star clusters or associations under the influence of Galactic tidal
disruption. While stars in such structures might be not gravitationally bound anymore,
they do share common genesis, which makes them a useful tool in understanding
both large-scale Galactic structure and stellar evolution. Methods for locating
such structures are still in development and many diferent approaches by
various scientific groups can be seen. No other stellar catalogue currently comes
close to Gaia level of depth and completeness of astrometric data, which allows
researchers to use methods previously deemed unfeasible. One approach often
seen is to pick an already known stellar cluster or group, and search in phase
space in Gaia data around it for an extended structures kinematically related
to it, for example, [
        <xref ref-type="bibr" rid="ref5 ref7">5, 7</xref>
        ]. Convergent-point methods, which look for the stars
aiming at the same point at the sky where the proper motions (apparent
motions on the sky) of the most stars in cluster do converge suit well for this task.
Another approach is applying clustering methods like DBSCAN/HDBSCAN to
stellar population in general [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Search for clusters in phase space (coordinates
+ velocities) allow to search for new structures, since this method does not rely
on finding the starting point first, and works with overdensities in phase space.
Another notable direction of research in Gaia data is an all-sky searches for an
ultra-wide binary stars [
        <xref ref-type="bibr" rid="ref1 ref8">1, 8</xref>
        ].
      </p>
      <p>
        We aim to combine the binary star search and clustering approaches: we use
our algorithm designed for search of ultrawide binaries described in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to look
not for the binary stars, but for a comoving stellar pairs (stars that are not
gravitationally bound like binary stars are, but do share motion properties) and
use the DBSCAN clustering algorithm to this preliminary list of stellar pairs
(instead of using it on star catalogue directly). Selection of DBSCAN is based on
this work [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] highlighting DBSCAN and HDBSCAN as the best algorithms for
clustering such data, and we choose DBSCAN over HDBSAN due to possibility
to explicitly set expected distance as a clustering parameter (so-called epsilon
parameter in DBSCAN). Constructing list of comoving pairs would allow us to
constrain the parameters of relative movement of stars before applying the
clustering algorithms. By decreasing the relative amount of pairs with unfavorable
parameters, we aim to increase the contrast with which stars (and pairs)
constituting extended stellar structures appear in the phase space, and hopefully,
improve the sensitivity of clustering algorithms thanks to that.
      </p>
      <p>This paper describes this work still in progress by presenting the results of
using such approach to a region of 30x30 deg in the sky, for stars between 100 and
1000 pc from the Sun. Paper is divided in two sections: first describes the
principles behind creating the preliminary list of pairs, and the second one describes
creating a model distribution of stars and applying the clustering algorithm to
real and model data.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Assembling Pairs Catalogue</title>
      <p>Assembling a preliminary pairs catalogue involves picking several parameters of
stellar pairs and putting a constraints of them. Search for pairs then is done by
ifltering lists of all possible pairs in the catalogue to fit the constraints. These
constraints are based on what values of these parameters are expected to be met
in pairs of stars constituting extended stellar structures.</p>
      <p>Most obvious first criteria is stars being spatially close to one another. We
limit projected separation between stars to 1 parsec. This limit also allow us
to significantly optimise pairs catalogue assembly process by subdividing search
region into smaller regions on the celestial sphere, sizes of which is determined by
maximum possible coordinate separation of a pair and then searching for pairs
just within these sub-regions and in neighbouring ones. This allows to decrease
computation complexity from C ∗ N 2 to C ∗ N 2/M , where N is the number
of stars and M is the number of regions (M = 1537 in reviewed case, which
0.004
0.002
0.000
75000
50000
25000
0
400
600
800
1000
1200
1400
Dis ance mean:-1.30
Distance median:14.66</p>
      <p>
        Probability:1.47 to be within 5.0 pc
meant a transition from impractically long to quite fast computation time).
This optimisation is reviewed in more detail in article [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Parallaxes = 2.00, 1.94 mas, errors: 0.50mas, 0.40mas.</p>
      <p>−1000
−500
0</p>
      <p>500
parsec</p>
      <p>In regards to data on stellar positions there is a significant diference between
treating errors of position on stellar sphere (right ascension and declination) and
radial distance (parallax). Errors of parallax are larger, also, the way distance to
the star is determined as inverse of parallax, coupled with the fact that absolute
error of parallax is not directly related to value of parallax, means that:
1. relative errors of parallax become larger for distant stars
2. symmetric errors of parallax values lead to asymmetric errors in distance
determination.</p>
      <p>For Gaia sample we use in this article (distances of 100 - 1000 parsec from
the Sun) mean error of parallax is 0.58 mas (milliarcseconds), which corresponds
to uncertainty of -370/+1380 parsec for star with actual distance of 1000 pc.
Uncertainty of position on celestial sphere is negligible compared to distance
uncertainties.</p>
      <p>Accounting for radial distance between stars thus cannot be done by simply
placing a constraint on apparent measured distances diference. We use two
approaches simultaneously. First approach is imposing limits on parameter labelled
(1)
Here π 1, π 2 are parallaxes of components, and σ π 1 , σ π 2 are their errors. Negative
values indicate that parallaxes are too diferent for stars to be close even if we
allow overlap within 3 standard deviations for both components. Highly positive
values indicate that errors are too large to tell with certainty that stars are close
together. Values that are positive, but close to zero indicate either that we can
say that stars are close together with relative certainty, or that the sum of errors
happened to be close to parallax diference (even if it is large).</p>
      <p>To diferentiate between these two scenarios, we use a second approach: a
model to simulate probability density distribution of radial distance between
stars by modelling probability density of distance to Sun for both stars, based
on their given parallaxes and their errors. We then integrate over limit (-5pc,
+5pc) for each pair to determine probability of them being within 5 pc of one
another along line-of-sight (P5) (See Fig. 1). We do not model this directly for
each actual pair, instead values for P5 in pairs are interpolated from grid of
26000 pre-computed points in 4-parameter (π 1, △π , σ π 1 , σ π 2 ) space covering all
the possible values in our sample densely enough.</p>
      <p>
        These two metrics do somewhat correlate (See Fig. 2): most of the pairs
with negative parallax consistencies have P5 &lt; 0.2%, and high P5 is mostly
associated with positive but not very high π cons. We also noted an increased
density of pairs with 0 &lt; π cons &lt; 1 in region with proper motion diference of
stars &lt; 2mas/yr and projected relative motion of &lt; 3km/s. Resulting constrains
on pairs parameters used are:
– projected separation &lt; 1 pc
– projected relative motion &lt; 3 km/s
– proper motion diference &lt; 6 mas/yr
– parallax consistency within −0.1 &lt; π cons &lt; 1 and P5 &gt; 0.5%.
When deciding on constraints on pairs properties, we guide ourselves with
theoretical and model considerations from [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Selected constraints are what we
roughly expect from stellar pair of neighbours in the same extended stellar
structure. Radial velocities are not considered since they are present only for the small
fraction of all Gaia EDR3 sources. Applying this constrains leaves us with list
of 104 000 pairs, distributed very non-uniformly at the celestial sphere.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Stellar Background Model and Clustering</title>
      <p>Aim of the stellar background model is to produce ”mock” catalogue of stars
which would be known to not contain any clusters, streams, or other stellar
structures. This catalogue then undergoes the same process of constructing
preliminary pairs and their clustering as a real catalogue, to check our algorithm
for not treating random overdensities in field stars distribution as an actual
extended stellar system.</p>
      <p>
        Model we use is the following: stars are first generated uniformly within same
region of sky as the real sample, but in a broader range of distances: 50 - 2000
pc instead of 100 - 1000. To simulate the efect of parallax errors on measuring
actual parallaxes, each star is assigned a “real” parallax in mas (1000./distance)
and a parallax measurement error. Distribution of parallax errors was made to
mimic that of the real sample (a combination of two Gaussian distributions and
an exponential distribution was found to be representing the real distribution the
best). “Measured” star parallaxes are obtained by modifying their “real”
parallaxes by a random number drawn from normal distribution with the standard
deviation equal to parallax error. After that the dataset is cropped to 100 - 1000
pc of “measured” distance. Stars get a random galactocentric velocity
components (U, V, W) with dispersions of 35, 30, and 20 km/s respectively to simulate
actual movement of stars in solar neighbourhood [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Density of stars in the
model sample is adjusted to match the average density of the real EDR3 data
(4,5 million stars in volume in question). Applying the algorithm to find stellar
pairs to this model with the same parameters and constrains results in a sample
of 10000 model pairs. Comparison of stellar density distribution on the sky for
real and model star samples can be seen at Fig. 3, and such a distribution for
pairs at Fig. 4.
      </p>
      <p>We apply a DBSCAN clustering algorithm to both real and model data.
Clustering is done in 4-coordinate space (2 for positions on the sky and 2 proper
motions). Radial distance is not considered due to parallax errors stretching any
stellar structure in line-of-sight direction too much, restrictions on π cons and P5
on pair composition phase are considered to be suficient to take distances into
account. Varying the DBSCAN clustering parameters shows that for majority
of parameter values where real data is subdivided into reasonable amount of
clusters, model data is all marked as background. Fig. 5 shows clustering
result for possible pair of parameters, subdividing the data into 21 clusters and
”background” pairs.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Summary and Conclusions</title>
      <p>We used Gaia EDR3 data for constructing a stellar pairs catalogue with
properties for pairs expected to be found within clusters and extended stellar structures.
With selected parameters, pairs distribution display structures of high density
which are easily seen against the ’background” pairs. Comparison with using
the same approach to stellar background model of similar density shows that
this dense structures cannot be attributed to random overdensities of uniform,
featureless distribution of stars. High contrast of such areas compared to
background eases the clustering task for algorithms of unsupervised learning, like
DBSCAN which we use in this article.</p>
      <p>Further work will include comparison of these structures with known
extended stellar structures and clusters, tuning parameters of pairing and
clustering to better recognize stellar structures, using other means for determining the
existence of evolutinary relationship between found structures and comparison
with models of stellar systems evolution to explain the observed stellar system
qualities.</p>
      <p>Acknowledgements. This work is supervised by Dr. Dana Kovaleva,
Department of Physics of stellar systems, Institute of Astronomy of Russian Academy
of Sciences. The work has made use of data from the European Space Agency
(ESA) mission Gaia processed by the Gaia Data Processing and Analysis
Consortium (DPAC). Funding for the DPAC has been provided by national institutions,
in particular the institutions participating in the Gaia Multilateral Agreement.
The study was partially supported by Russian Foundation for Basic Research
(Project no. 20-52-12009).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>El-Badry</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rix</surname>
            ,
            <given-names>H.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tian</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Duchˆene, G.,
          <string-name>
            <surname>Moe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Discovery of an equalmass 'twin' binary population reaching 1000 + au separations</article-title>
          .
          <source>Mon. Not. R. Astron. Soc489(4)</source>
          ,
          <fpage>5822</fpage>
          -
          <lpage>5857</lpage>
          (
          <year>Nov 2019</year>
          ). https://doi.org/10.1093/mnras/stz2480
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Gaia</given-names>
            <surname>Collaboration</surname>
          </string-name>
          :
          <article-title>The Gaia mission</article-title>
          .
          <source>Astron. Astrophys595, A1 (Nov</source>
          <year>2016</year>
          ). https://doi.org/10.1051/
          <fpage>0004</fpage>
          -6361/
          <fpage>201629272</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Gaia</given-names>
            <surname>Collaboration</surname>
          </string-name>
          :
          <article-title>Gaia Early Data Release 3. Summary of the contents and survey properties</article-title>
          .
          <source>Astron. Astrophys649</source>
          , A1 (May
          <year>2021</year>
          ). https://doi.org/10.1051/
          <fpage>0004</fpage>
          - 6361/
          <fpage>202039657</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hunt</surname>
            ,
            <given-names>E.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Refert</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Improving the open cluster census. I. Comparison of clustering algorithms applied to Gaia DR2 data</article-title>
          .
          <source>Astron. Astrophys646, A104 (Feb</source>
          <year>2021</year>
          ). https://doi.org/10.1051/
          <fpage>0004</fpage>
          -6361/
          <fpage>202039341</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Jerabkova</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bofin</surname>
            ,
            <given-names>H.M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beccari</surname>
            , G., de Marchi, G., de Bruijne,
            <given-names>J.H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prusti</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>The 800 pc long tidal tails of the Hyades star cluster. Possible discovery of candidate epicyclic overdensities from an open star cluster</article-title>
          .
          <source>Astron. Astrophys647, A137 (Mar</source>
          <year>2021</year>
          ). https://doi.org/10.1051/
          <fpage>0004</fpage>
          -6361/
          <fpage>202039949</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kamdar</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Conroy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ting</surname>
            ,
            <given-names>Y.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonaca</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>A.G.A.</given-names>
          </string-name>
          :
          <article-title>Stars that Move Together Were Born Together</article-title>
          .
          <source>Astrophys. J. Lett884(2)</source>
          ,
          <source>L42 (Oct</source>
          <year>2019</year>
          ). https://doi.org/10.3847/2041-8213/ab4997
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Ro¨ser,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Schilbach</surname>
          </string-name>
          , E.:
          <string-name>
            <surname>Praesepe</surname>
          </string-name>
          (NGC 2632)
          <article-title>and its tidal tails</article-title>
          .
          <source>Astron. Astrophys627, A4 (Jul</source>
          <year>2019</year>
          ). https://doi.org/10.1051/
          <fpage>0004</fpage>
          -6361/
          <fpage>201935502</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Sapozhnikov</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovaleva</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malkov</surname>
            ,
            <given-names>O.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sytov</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          :
          <article-title>Binary Star Population with Common Proper Motion in Gaia DR2</article-title>
          .
          <source>Astronomy Reports</source>
          <volume>64</volume>
          (
          <issue>9</issue>
          ),
          <fpage>756</fpage>
          -
          <lpage>768</lpage>
          (
          <year>Sep 2020</year>
          ). https://doi.org/10.1134/S1063772920100078
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Schon¨rich, R.,
          <string-name>
            <surname>Binney</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dehnen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Local kinematics and the local standard of rest</article-title>
          . Mon. Not. R. Astron.
          <source>Soc403(4)</source>
          ,
          <fpage>1829</fpage>
          -
          <lpage>1833</lpage>
          (
          <year>Apr 2010</year>
          ). https://doi.org/10.1111/j.1365-
          <fpage>2966</fpage>
          .
          <year>2010</year>
          .
          <volume>16253</volume>
          .x
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>