<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Machine Learning for Automated Gating of Flow Cytometry Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Muhammad Sufian Stefano Papa</string-name>
          <email>m.sufian@campus.uniurb.it</email>
          <email>stefano.papa@uniurb.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sara Montagna</string-name>
          <email>sara.montagna@uniurb.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Bogliolo</string-name>
          <email>alessanro.bogliolo@uniurb.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudio Ortolani</string-name>
          <email>claudio.ortolani@uniurb.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mario D'Atri</string-name>
          <email>mario.datri@uniurb.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Urbino Carlo Bo</institution>
          ,
          <addr-line>Urbino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Manual gating is the traditional procedure adopted to identify cellular clusters from multi-dimensional datasets generated with flow cytometry, a tool for detecting and monitoring diferent diseases by acquiring single cell features. However, the identification of cellular subpopulations by manual gating is a time-consuming process strongly afected by human expertise. Automated analysis supported by computational systems, such as machine learning approaches, can radically change the way flow cytometry data are elaborated. In this paper we applied a suite of machine learning classifiers for analysing samples of peripheral blood acquired with flow cytometry. The goal was to identify CD4+ lymphocytes population. Four ML classifiers are examined -Support Vector Machine, Random Forest, Multilayer Perceptron and Logistic Regression using stratified 10-fold cross-validation. All the four models perform very well, with a balanced accuracy score &gt; 0.945. We come to the conclusion that all four algorithms classify the events of interests with promising results, paving the way for further investigations.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Flow Cytometry</kwd>
        <kwd>Automated Gating</kwd>
        <kwd>Supervised Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Flow cytometry (FL) is an experimental technique that enables to measure cellular properties at
a single-cell resolution by quantifying, for instance, antigens expressed on the cell surface and
various physical properties [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. From multi-dimensional datasets generated from FL, manual
gating is performed to identify cellular clusters with similar properties. As such, it is adopted
in detecting and monitoring diferent diseases, such as those of the immune system. Given
the progress in the instrumentation used for cell cytometry, the number of features that can
be acquired is continuously increasing, making the identification of cellular subpopulations
by manual gating a time-consuming process strongly afected by human expertise. The
automated analysis supported by computational systems, such as machine learning approaches, can
radically change the way flow cytometry data are elaborated [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        The general computational cytometry workflows can be classified into two categories based
on the methods employed: discovery analysis, i.e., the detection of unknown, unique cell
populations, versus focused analysis i.e., the detection of known well-defined ones. Automation
can potentially lessen variability in the data analysis process in both situations. Cell populations
that are neglected in successive manual gating procedures, such as cells gated out in earlier steps,
can be found using automated technologies in discovery mode. When using focused analysis,
the cell populations of interest are precisely specified, and the data analysis procedure adheres
to a set of techniques that is likely to be validated and authorised. Automated technologies can
lessen human efort by automatically classifying cases as healthy or diseased and only raising
questions about certain cases for people to consider [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        The computational approaches can be further categorised: automating the manual gating
process based on rules or cell densities (flowDensity [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], OpenCyto [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]); clustering of flow
cytometry data (cells, events) based on similar characteristics in high-dimensional space
(FlowSOM [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], Phenograph [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], SPADE [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]); and the supervised classification in which the data is
annotated to train the learning model so that it can classify unlabelled data, i.e., cell populations
or events (FlowLearn [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], ACDC [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], DeepCyTof [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]). Even though literature already reports
interesting results in this field, they still are not a clinical practice.
      </p>
      <p>In this paper, we applied a suite of machine learning classifiers for analyzing samples of
peripheral blood acquired with flow cytometry. The goal was to identify the 4+ lymphocyte
population. We show the efectiveness of our approach for classifying cells with a series of tests,
cross-verifying the trained models on various data files, and comparing the cell classifications
with those acquired by manual gating. Four ML classifiers are examined —Support Vector
Machine, Random Forest, Multilayer Perceptron and Logistic Regression —using stratified
10fold cross-validation. All four models perform very well, with a balanced accuracy score of
&gt; 0.945.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        Flow cytometry is a standard method for analysing and quantifying biological data. The
capabilities of cytometry have increased, giving rise to several data-analysis methodologies [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        The flowDensity [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is an approach that performs a computational analysis of the flow
cytometry data by automating the manual gating process based on the sequential bivariate
gating method. The properties of the density distribution are used to select the ideal cut-of
for each unique marker using 2D scatter plots. This method has limitations when the target
is to identify unknown populations, as it looks at two dimensions simultaneously. As a result,
unknown populations can be easily missed. Another method OpenCyto [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] replicates manual
gating, facilitates data analysis, and provides interpretable results by incorporating
domainspecific knowledge. It concentrated on finding uncommon, antigen-specific T-cell populations
and discovered a novel subgroup of 8 T-cells with a vaccine-regimen-specific response that
could not be found by manual analysis.
      </p>
      <p>
        Several machine learning approaches have been devised to identify new or
predetermined cellular populations and perform automated analysis of flow cytometry data
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. These include supervised and unsupervised learning approaches. In the former, the
data is annotated to train the learning model so that it can classify unlabelled data (i.e.,
cells, samples). The latter performs multi-channel (multivariate) analysis, i.e., grouping cells
with similar characteristics through clustering analysis. Our contribution falls in the former case.
      </p>
      <p>
        The FlowSOM [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] approach used self-organizing map (an unsupervised technique for
clustering and dimensionality reduction) to visualize and cluster the data from flow cytometry.
It employed a substantially higher number of clusters when less number of cell types were
expected. FlowSOM can be used as a starting point for analysis or as a tool to see the data after
performing manual gating. The position and identification of normal mononuclear-cell subsets
in viSNE displays were determined by analyzing individual peripheral blood samples that
either included a neoplastic or reactive T-cell lymphocytosis alongside a cohort of 10 healthy
samples [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. A PhenoGraph and viSNE-based combined method was applied to peripheral blood
mononuclear cells stained with a single 8-color T/NK cell antibody combination. The numbers
of neoplastic T-cells discovered with PhenoGraph/viSNE coincided with those discovered
using manual gating. Another cell density-based approach that identified a functionally
diferent cell population without utilizing any particular underlying characteristics is called
Spanning-tree Progression Analysis of Density-normalized Events (SPADE) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. SPADE was
applied to two independent sources of cytometry data in four steps: down-sampling based on
density, clustering, connecting clusters with minimal spanning tree, and upsampling to restore
the cells as a final output. There are drawbacks to SPADE, such as the algorithm’s halting
condition depends on the number of clusters; if the number of clusters is too low, the SPADE
tree cannot accurately represent the cloud’s shape. If this value is very high, it becomes dificult
to understand the SPADE tree.
      </p>
      <p>
        Similar to flowDensity , another piece of software called flowLearn uses density
characteristics but does not require the user to adjust hyper-parameters to achieve the best results
manually. Instead, it operated in a semi-supervised manner necessitating the establishment
of thresholds by a human expert for gating one or a few distinctive samples. Then, these
criteria are automatically applied to all data using a process known as derivative-based density
alignments. It predicted gates on additional samples using a limited number of manually gated
samples with density alignments. A drawback in this approach could be the gating of a limited
number of samples, and density-bound rules could over-fit results. A supervised learning-based
approach Automated Cell-type Discovery and Classification (ACDC) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] automated the cell
annotation by employing biological information as an input parameter. The ACDC method
consists of two parts. First, a user-specified table of markers and cell labels is converted
into a high-dimensional space. Second, it used random walks to execute semi-supervised
classification to gather data from every point and categorize the events at the single-cell level.
The ACDC has the drawback that each marker label is binary (present or absent). In contrast,
intermediate markers are used to identify cell populations of interest in real life [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The
DeepCyTOF adopts a diferent point of view for gating; it needs labelled cells from one sample
to perform supervised calibration between a source domain distribution (reference sample) and
many target domain distributions (target samples). A multi-autoencoder neural network is
the foundation of the DeepCyTOF. In reality, diferences across equipment were found to be
relatively frequent in CyTOF investigations. These diferences might be a weakness of this
methodology that causes considerable batch efects in datasets with samples taken at various
runs. Consequently, observable diferences might be found between the data distributions of
the training data (manually gated reference sample) and the remaining unlabeled test data (the
remaining samples).
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Materials and Methods</title>
      <p>This section is divided into two subsections: (i) details about the data set and pre-processing of
the data, and (ii) experiments with machine learning models.</p>
      <sec id="sec-3-1">
        <title>3.1. Data Set and Data Pre-Processing</title>
        <p>The data exploited in the study have been derived by the routinarian diagnostic activity
performed by the Center of Cytometry of Urbino University. Data set were randomly selected and
anonymized in order to make impossible the identification of the source.</p>
        <p>For every peripheral blood sample, data were acquired on intrinsic parameters and antigen
expression displayed by white blood cells. The analyses were performed through a commercial
lfow cytometer and focused on the following parameters: i) Forward Scatter (FSC), ii) Side
Scatter (SSC), iii) CD3 antigen expression, iv) CD4 antigen expression, v) CD8 antigen expression,
vi) CD16 and CD56 combined antigen expression, and vii) C45 antigen expression. In all, 15
subjects were analyzed. For each parameter, data related to the pulse area were considered.</p>
        <p>The cytometric files produced by the analytical runs were then stripped of metadata and
exported in csv format. Consequently, the dataset consists of fifteen (15) diferent data files,
where each row is a cell, and each column contains the corresponding value of one of the 8
parameters as mentioned above (Table 1) with no missing values. Data records are labeled
with binary values, i.e., gated and ungated (1 used for gated and 0 for ungated records). The
extraction of gated and ungated records was possible due to the combined use of a commercial
program to select the clusters of interest (Cytopaint1) and an unpublished program for the
management of flow cytometry standard (FCS) files (Wizard) provided by one of us (MDA).
In particular, in each experiment, CD4+ T cells (gated for CD4 expression) are identified by
manual gating performed by experienced operators. The gating logic performed on each data
ifle to filter out the gated records is as follows:
1. The creation of a first gate on parameters CD45+ vs SSC (denoting it as Gate-1) and
selecting lymphocytes;
2. The results of Gate-1 are expressed for parameters CD3 and CD19 and a gate was traced
on the events CD3+ CD19 and CD3 CD19+ (denoting it as Gate-2)
3. The results of Gate-2 are expressed for parameters CD4 and CD8, and a gate is traced on
the events CD4+ CD8 (denoting it as Gate-3 which constitutes the population of T Helper
1http://leukobyte.com/cytopaint-classic/</p>
        <p>Lymphocytes CD3+ CD4+ CD8-) and another gate on the events CD4 CD8+ (denoting it
as Gate-4).</p>
        <p>After the hierarchical gating process, a subset of records is obtained based on CD4+ cells which
will be part of class 1. The resulting dataset is unbalanced between the two classes. The
percentage of CD4+ manually gated samples is presented in Table 2 against the total sample
size of each data file. Due to the critical nature of medical data sets, data balancing techniques
are not often recommended. Thus, we applied the k-fold stratified cross-validation technique
for training and testing the ML classifiers because it maintains the same class ratio across the K
folds as the ratio in the original dataset.</p>
        <p>To conclude, our approach avoids information loss in the cell gating stage by directly using
the labelled flow cytometry data.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Machine Learning Classifiers and Experimental Setup</title>
        <p>Our study compares the outcomes of allocating cell events to discrete cell populations (gated
and ungated cells) using automated gates with the results from manual gates produced by
expert analysis. In particular, the classification goal is the automatic identification of T Helper
Lymphocytes CD4+, which constitute the gated population, against the ungated ones.</p>
        <p>We adopted supervised ML models, and trained the algorithms with gates supplied by experts.
A suite of ML classifiers is employed for classification. The reason is to use various types
of classifiers to observe the accuracy and over-fitting issues under the diferent classification
mechanisms, i.e., decision tree-based, gradient-based, neural network-based and able to classify
non-linearly separable data. These classifiers, with their brief descriptions, are listed below (the
mathematical equations for these classifiers are omitted as these are well-understood methods):
1. Random Forest (RF)—is a meta estimator that averages the results of many decision tree
classifiers by fitting diferent sub-samples of the dataset to increase predicted accuracy
and reduce over-fitting.
2. Logistic Regression (LR)—is a parametric regression technique that involves fitting a
line (or a curve) to the data and then using the gradient descent function to distinguish
between diferent output classes. In this study, we utilized LR with gradient descent,
which fits the dataset with a curve.</p>
        <p>
          3. Support Vector classifier (SVC)—similar to LR, it also fits a curve on the dataset; however,
the curve itself tends to maintain a maximum margin on both sides [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. We employed a
radial basis function kernel in this research to separate the data points because the dataset
appears not linearly separable.
4. Multi-Layer Perceptron (MLP)—is a basic artificial neural network (ANN) type. We used
MLP with one hidden layer of 100 neurons, and the other hyper-parameters are kept the
same as the default values provided in the scikit-learn2 Python library.
        </p>
        <p>For the training and validation phase, we mainly designed two experiments (each maintains
details about the sub-experiments) to examine the classification of gated and ungated samples.
Exp. 1 The first experiment applies the ML classifiers listed above to each single data file, by
extracting the train and test-set (80-20 split);
Exp. 2 The second experiment is conducted by training the classifiers on 10 subjects randomly
selected (accumulated training data) and testing the rest on the remaining 5 subjects.
Figure 2 illustrates the general ML pipeline adopted for the flow cytometry data. The flow
cytometry standard (FCS) data files are used for preprocessing. The processed FCS data then
subjected to apply the hierarchical gating (the gating process is described in section 3.1). After
the gating process, cell-annotation is performed and CD4+ T cells are separated from the rest
of cell populations. The supervised machine learning pipeline is used to train and validate
the models. A stratified 10-fold cross-validation (CV) technique is employed to validate the
classification performance of trained models. The results for CD4+ T cells are compared with
results of trained ML models to evaluate the performance of all models. The training data for
each fold were obtained in equal amounts to equalize the class occurrence frequencies. These
data were then used to train a model. The validation set is then used to validate the model.</p>
        <p>The experiments are conducted using a Python Collaboratory environment. Experimental
results are described in Section 4.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Evaluations</title>
      <p>In this section, we present the results for the two types of experiments.</p>
      <p>Exp. 1 Results of the first experiment are reported in Table 3 for only 3 data files, since all
classifiers’ results showed similar performances on the diferent data files, which likely means
that the datasets’ distributional variations weren’t significant. The average scores for both
metrics are presented.</p>
      <p>It can be observed from Table 3 that RF outperformed other classifiers for both  = .985
and  1 = .998 metrics for data file-1. LR performed better for data file-2, and all the classifiers
achieved similar results on data file-3 for BA and F1 (Balanced Accuracy and F1-Score). In
general, all the four ML classifiers performed very well, with a balanced accuracy score of
&gt; 0.945.</p>
      <p>Exp. 2 We examined the same four classifiers in the second experiment. The training data
from 10 data files are added to a new file, making a more extensive training set. The obtained
training set contains the same features and data distribution with corresponding labels as the
source data and is split into 10-folds. The ML classifiers are trained and validated with stratified
10-fold cross-validation. Then, the resulting trained classifiers are subjected to evaluation on
the testing sets of other 5 files. The performance of ML classifiers on each file for 4+ cell
classification is presented in Table 4, in which average scores for both metrics are presented. It
can be observed from Table 4 that SVC and RF maintained their performance the same as for
the first experiment. The MLP and LR have shown a slight downfall in results. The least score
of BA for MLP on file-12 and file-15 was recorded at .925 and .897, respectively. The least score
of BA for LR on file-14 and file-15 was recorded at .952 and .968, respectively. Generally, the
performance for all classifiers is promising, with a score of BA &gt; .897.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Discussion</title>
      <p>The field of flow cytometry witnesses a significative progress in blood samples analysis that
brought the acquisition of a huge amount of data. Given these premises, automatic data
analysis, carried out using supervised learning techniques that automatically categorize samples
according to clinical protocols, can provide enormous benefits. Such analysis is possible through
automated methods without human subjectivity and gating bias.</p>
      <p>We conducted two experiments to automate the manual gating procedure for classifying
CD4+ T cells from flow cytometry data. Four ML supervised algorithms have been trained with
samples manually gated by experts in the pre-processing phase. The current study demonstrates
our method’s capacity to distinguish the T Helper Lymphocytes CD4+ among all the types of
cells present in the dataset, with high precision in terms of balanced accuracy and f1-score. This
result suggests that, with training data available as gated examples, supervised classification
ofers an efective technique for automatic analysis of flow cytometry data, enabling to extract
and compute the size of diferent cellular populations.</p>
      <p>Despite its apparent simplicity, this approach is of particular importance, as it constitutes a
replicable mechanism in a series of increasingly complex contexts, which can be exploited for the
realization of algorithms aimed at the automatic diagnosis of haemato-oncological pathologies
with characteristic phenotypes.</p>
      <p>As the accuracy of all the models is very high, the future work is to explore the importance
of the features and evaluate if some straightforward relationship between features and output
is present.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgment</title>
      <p>This work is a part of a collaboration project to automate the gating process of clinical flow
cytometry at University of Urbino.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Verschoor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lelic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Bramson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Bowdish</surname>
          </string-name>
          ,
          <article-title>An introduction to automated flow cytometry gating tools and their implementation</article-title>
          ,
          <source>Frontiers in immunology 6</source>
          (
          <year>2015</year>
          )
          <fpage>380</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Montante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Brinkman</surname>
          </string-name>
          ,
          <article-title>Flow cytometry data analysis: Recent tools and algorithms</article-title>
          ,
          <source>International Journal of Laboratory Hematology</source>
          <volume>41</volume>
          (
          <year>2019</year>
          )
          <fpage>56</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cheung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Campbell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Whitby</surname>
          </string-name>
          , R. J.
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Braybrook</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Petzing</surname>
          </string-name>
          ,
          <article-title>Current trends in flow cytometry automated data analysis software</article-title>
          ,
          <source>Cytometry Part A</source>
          <volume>99</volume>
          (
          <year>2021</year>
          )
          <fpage>1007</fpage>
          -
          <lpage>1021</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Malek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Taghiyar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chong</surname>
          </string-name>
          , G. Finak,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gottardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Brinkman</surname>
          </string-name>
          ,
          <article-title>flowdensity: reproducing manual gating of flow cytometry data by automated density-based cell population identification</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>31</volume>
          (
          <year>2015</year>
          )
          <fpage>606</fpage>
          -
          <lpage>607</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Finak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Frelinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. W.</given-names>
            <surname>Newell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ramey</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Davis</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Kalams</surname>
            , S. C. De Rosa,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Gottardo</surname>
          </string-name>
          ,
          <article-title>Opencyto: an open source infrastructure for scalable, robust, reproducible, and automated, end-to-end flow cytometry data analysis</article-title>
          ,
          <source>PLoS computational biology 10</source>
          (
          <year>2014</year>
          )
          <article-title>e1003806</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Van Gassen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Callebaut</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. J. Van Helden</surname>
            ,
            <given-names>B. N.</given-names>
          </string-name>
          <string-name>
            <surname>Lambrecht</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Demeester</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Dhaene</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Saeys</surname>
          </string-name>
          , Flowsom:
          <article-title>Using self-organizing maps for visualization and interpretation of cytometry data</article-title>
          ,
          <source>Cytometry Part A</source>
          <volume>87</volume>
          (
          <year>2015</year>
          )
          <fpage>636</fpage>
          -
          <lpage>645</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>DiGiuseppe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Cardinali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. N.</given-names>
            <surname>Rezuke</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Pe'er, Phenograph and visne facilitate the identification of abnormal t-cell populations in routine clinical flow cytometric data</article-title>
          ,
          <source>Cytometry Part B: Clinical Cytometry</source>
          <volume>94</volume>
          (
          <year>2018</year>
          )
          <fpage>744</fpage>
          -
          <lpage>757</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. F.</given-names>
            <surname>Simonds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Bendall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. D.</given-names>
            <surname>Gibbs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Bruggner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Linderman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sachs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. P.</given-names>
            <surname>Nolan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Plevritis</surname>
          </string-name>
          ,
          <article-title>Extracting a cellular hierarchy from high-dimensional cytometry data with spade</article-title>
          ,
          <source>Nature biotechnology 29</source>
          (
          <year>2011</year>
          )
          <fpage>886</fpage>
          -
          <lpage>891</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Brinkman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chauve</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Laing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lorenc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Abeler-Dörner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hammer</surname>
          </string-name>
          ,
          <article-title>lfowlearn: fast and precise identification and quality checking of cell populations in flow cytometry</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>34</volume>
          (
          <year>2018</year>
          )
          <fpage>2245</fpage>
          -
          <lpage>2253</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>H.-C. Lee</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Kosoy</surname>
            ,
            <given-names>C. E.</given-names>
          </string-name>
          <string-name>
            <surname>Becker</surname>
            ,
            <given-names>J. T.</given-names>
          </string-name>
          <string-name>
            <surname>Dudley</surname>
            ,
            <given-names>B. A.</given-names>
          </string-name>
          <string-name>
            <surname>Kidd</surname>
          </string-name>
          ,
          <article-title>Automated cell type discovery and classification through knowledge transfer</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>33</volume>
          (
          <year>2017</year>
          )
          <fpage>1689</fpage>
          -
          <lpage>1695</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Shaham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Stanton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Montgomery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kluger</surname>
          </string-name>
          ,
          <article-title>Gating mass cytometry data by deep learning</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>33</volume>
          (
          <year>2017</year>
          )
          <fpage>3423</fpage>
          -
          <lpage>3430</lpage>
          . doi:
          <volume>10</volume>
          .1093/ bioinformatics/btx448.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B. S. . B. A. J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          ,
          <article-title>Application of machine learning for cytometry data</article-title>
          , Frontiers in immunology,
          <volume>12</volume>
          ,
          <fpage>787574</fpage>
          . (
          <year>2022</year>
          ). doi:https://doi.org/10.3389/fimmu.
          <year>2021</year>
          .
          <volume>787574</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Levine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. F.</given-names>
            <surname>Simonds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Bendall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. L.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>El-ad</surname>
          </string-name>
          , M. D.
          <string-name>
            <surname>Tadmor</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Litvin</surname>
            ,
            <given-names>H. G.</given-names>
          </string-name>
          <string-name>
            <surname>Fienberg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Jager</surname>
            ,
            <given-names>E. R.</given-names>
          </string-name>
          <string-name>
            <surname>Zunder</surname>
          </string-name>
          , et al.,
          <article-title>Data-driven phenotypic dissection of aml reveals progenitor-like cells that correlate with prognosis</article-title>
          ,
          <source>Cell</source>
          <volume>162</volume>
          (
          <year>2015</year>
          )
          <fpage>184</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Belkina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. O.</given-names>
            <surname>Ciccolella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Anno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Halpert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Spidlen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Snyder-Cappione</surname>
          </string-name>
          ,
          <article-title>Automated optimized parameters for t-distributed stochastic neighbor embedding improve visualization and analysis of large datasets</article-title>
          ,
          <source>Nature communications 10</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B. E.</given-names>
            <surname>Boser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. M.</given-names>
            <surname>Guyon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          ,
          <article-title>A training algorithm for optimal margin classifiers</article-title>
          ,
          <source>in: Proceedings of the fifth annual workshop on Computational learning theory, 1992</source>
          , pp.
          <fpage>144</fpage>
          -
          <lpage>152</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>