=Paper= {{Paper |id=Vol-2022/paper26 |storemode=property |title= Search for Gender Difference in Functional Connectivity of Resting State fMRI |pdfUrl=https://ceur-ws.org/Vol-2022/paper26.pdf |volume=Vol-2022 |authors=Dmitry Kovalev,Sergey Priimenko,Natalya Ponomareva |dblpUrl=https://dblp.org/rec/conf/rcdl/KovalevPP17 }} == Search for Gender Difference in Functional Connectivity of Resting State fMRI == https://ceur-ws.org/Vol-2022/paper26.pdf
      Search for Gender Difference in Functional Connectivity
                      of Resting State fMRI
      © Dmitry Kovalev1                       © Sergey Priimenko2                         © Natalya Ponomareva3
  1
      Federal Research Center “Computer Science and Control” of Russian Academy of Sciences,
                                           Moscow, Russia
                               2
                                 Lomonosov Moscow State University,
                                           Moscow, Russia
                                   3
                                     Research Center of Neurology,
                                           Moscow, Russia
               dkovalev@ipiran.ru                 mior12@mail.ru                   ponomare@yandex.ru
            Abstract. During past several year huge sets of fMRI data were obtained within Human Connectome
      Project. Despite this, technologies for scalable analysis of large amounts of data are rarely used to analyze
      whole data set. Authors conducted virtual experiment on a large sample of data taken from the HCP to find
      the gender differences in functional connectivity. A review of methods for search for the functional
      connectivity is fulfilled. Further analysis of distributed use and scalability on large datasets of rfMRI data is
      provided with the discussion of existing libraries and suggestions of how to integrate them with a distributed
      system. As a result, the distributed architecture of the software based on the Apache Spark framework is
      developed. Being fairly complex, it includes ontology, conceptual schema and workflow. The results of this
      experiment may be of interest to neurophysiologists for further analysis.
            Keywords: data intensive research, distributed infrastructure, problem solving in neurophysiology.

                                                                   community. Such large-scale data warehouses could
1 Introduction                                                     serve as the beginning for the use of technologies for
    Today in many branches of science it is necessary to           analyzing large amounts of data in the neuroimaging of
solve problems associated with increasing scale of data            the human brain, yet there are some limitations. One of
[1–3]. This led to the development of specialized tools,           the reasons why the community of neurobiologists do not
which primarily focus on structured data, but are                  use tools to work with large amounts of data is that
increasingly being adapted for more general forms [4, 5].          standard file formats, such as NIFTI[10], are binary and
Yet this tools and software are not widely used in data            possess additional costs to deliver to distributed file
intensive research and methodology to correctly apply              systems. Another problem is that many distributed
them has still to be developed. Different use-cases from           systems do not effectively perform iterative algorithms,
multidisciplinary fields can greatly impact the evolution          such as principal component analysis (PCA) and the
of this methodology and tools.                                     independent component analysis (ICA), which are
                                                                   actively used in the field of neuroimaging.
    One of the most prominent examples of data
intensive domains is the field of neurophysiology, where               One of significant are of research in neurophysiology
the amount of data has reached petabyte scale.                     is the study of gender difference in functional
Neurophysiology allows to visualize the structure,                 connectivity [11]. For example, there is a study of army
functions and biochemical characteristics of the brain. In         veterans that experience physical and psychiatric
particular, approaches to find the functional connections          complications, including craniocerebral trauma, post-
of the brain departments are being explored [6]. One way           traumatic stress and depression. The integration of a
to do that is to measure the functional connectivity               large number of women into military operations attracted
between brain regions as the level of co-activation of             attention to the potential sexual differences in the
spontaneous functional time series of resting-state fMRI           frequency and recovery from craniocerebral trauma, as
[7–9].                                                             well as from other concomitant disorders. Understanding
                                                                   the role of gender-related effects can provide information
    During past several years, major projects such as the
                                                                   on the needs for evaluating treatment for women, which
Human Connectome Project (HCP) and the 1000
                                                                   can demonstrate both similarities and differences from
connectome have started with more than a thousand
                                                                   men.
people participating. Datasets are open to the scientific
                                                                       This article aims at developing approach for a
                                                                   distributed analysis of data intensive neurophysiology
Proceedings of the XIX International Conference                    domain. The article is structured as follows. Section 2
“Data Analytics and Management in Data Intensive                   surveys existing distributed methods ant tools to process
Domains” (DAMDID/RCDL’2017), Moscow, Russia,                       and analyze neurophysiological datasets. Section 3
October 10–13, 2017



                                                             150
presents domain ontology that was created to better                 2.2 Data analysis methods
interact with domain experts. Section 4 describes
                                                                        The data of each subject is represented as a matrix
distributed programming implementation on the existing
                                                                           (see Fig. 1), where each row represents a set of
computational infrastructure, as well as output results.
                                                                    voxels of the brain at a particular time, and each column
Section 5 concludes the article.
                                                                    is a time series for the corresponding voxel [12]. It is
                                                                    assumed that the data has already been pre-processed to
                                                                    remove artifacts and scaled to a standard space
                                                                    (coordinate system) so that the voxels are anatomically
                                                                    compatible for all subjects. It is also assumed that the
                                                                    time series of each voxel is shifted by its mean (and,
                                                                    possibly, normalized to the variance) [13].
                                                                         If the data set consists of one object, in order to
                                                                    reduce the dimensionality of the data, the PCA is applied:


                                                                    where       is the number of main components (usually
                                                                    much smaller than ),            is the set of temporal
Figure 1 Transformation of 4-D array into 2-D array                 eigenvectors, is the set of spatial eigenvectors, and the
                                                                    corresponding eigenvalues on the main diagonal of the
2 Data analysis methods                                             matrix ( largest eigenvalues). Then, ICA is applied
                                                                    to the matrix       , estimating a new set of spatial
2.1 Data processing                                                 components that are linear combinations of the vectors
    Resting-state fMRI dataset from the HCP project is              of the matrix and are maximally independent of each
used. The HCP consortium has developed an information               other. If the data set consists of several subjects, then
platform for storing raw and processed data, systematic             initially all the data is combined into one large set
processing and analysis of data, obtaining and                      consisting of s subjects, and then PCA and ICA are
researching data. One of the main components of the                 applied. The resulting approximation will be the same as
project is ConnectomeDB. ConnectomeDB provides                      above, but now with dimensions              (see Fig. 2).
database services for the storage and dissemination of                  With large data sets, or with a large number of
datasets that are open to the scientific community. The             subjects, it becomes unreasonable to form a complete set
data is already preprocessed. Preprocessing consists of             of data, and then apply PCA and ICA due to memory and
removing spatial artifacts, distortion, surface formation           time limitations. To solve this problem, several
and alignment to a single standard space.                           algorithms were invented.
    Data processing is divided into two parts: data
cleaning inside the brain (FMRIVolume) and on the
brain surface (FMRISurface) [3].
    At the FMRIVolume stage, spatial distortion
removal, volume redistribution due to subject movement
during the session, normalization of 4D images to the
standard value and creation of the final brain mask are
done.
    The main purpose of FMRISurface is to display time
series in the standard CIFTI space. This is achieved by               Figure 2 PCA for concatenated data
comparing the voxels in the cortical region of the gray
matter to the native surface of the cortex and
transforming each subcortical region for each individual
to a standard set of voxels for each data set.
    After processing the data, resting-state fMRI time
series are stored in a special format – NIFTI. As a result,
the data obtained with the resting-state fMRI yields more
than 10 TB obtained for more than 1000 people. During
the experiment, each patient was placed in a dark room
and asked to relax, but not to fall asleep. The experiment
was conducted in 4 sessions for 15 minutes. Two                     Figure 3 Parallel execution of PCA
sessions of the fMRI device took pictures from the left
side of the brain to the right side of the brain, and the              In 2001, it was suggested to approximate the
other two sessions from the right side of the brain to the          concatenation of all data sets by first reducing each set of
left.                                                               data to m main spatial vectors using PCA and then
                                                                    concatenating them and applying the final PCA to reduce




                                                              151
the final dataset to n components and then apply ICA                 2.3 Libraries
[14]. Although using a small value of            limits the              Nibabel [16] is a library that provides an API for
memory requirements for these operations, the data size              reading and writing some common file formats for
is scaled linearly with the number of objects, which can             neuroimaging. These formats include: ANALYZE
eventually become impractically large. In addition, an               (plain, SPM99, SPM2 and higher), GIFTI, NIfTI1,
important piece of information can be lost if        is not          NIfTI2, MINC1, MINC2, MGH and ECAT, as well as
relatively large (usually it should not be large).                   Philips PAR/REC. Different image format classes
Information can be difficult to assess at the level of an            provide full or selective access to header information
individual subject, but it can be important at the group             (meta), and access to image data is made available
level (see Fig. 3).                                                  through the arrays of the numpy library.
    To overcome these limitations, the MELODIC's                         Objects of the image of nibabel consist of three
Incremental Group-PCA (MIGP) algorithm was                           elements:
proposed[15]. MIGP is an incremental approach, the                       1. The n-dimensional array containing the image
goal of which is to provide a very close approximation to
                                                                         2. Matrix of affine transformations of size 4x4,
the complete concatenation of the data set followed by
                                                                               which correlates the image coordinates with the
the PCA, but without large memory requirements. High
                                                                               standard world coordinate space.
accuracy is achieved due to the fact that individual sets
of subjects’ data do not decrease to a small number of                   3. Image metadata, stored in the header.
components of PCA. The incremental approach                              When an image is loaded, an object of type
preserves the inner space of PCA from             weighted           Nifti1mage is created. The file name can have an
spatial eigenvectors, where      is usually larger than the          extension of both .nii and .nii.gz.
number of time points in each individual data set. By                    It is worth noting that when the load function is called
“weighted” is meant that the eigenvalues are included in             directly, image data is not loaded into memory, since
the matrix of spatial eigenvectors. The final set of m               images can be stored as a numpy array or stored on a
components representing the temporarily concatenated                 disk. To load data from a disk, you need to call the
output of the PCA can then be reduced to the required                get_data() function of an object of type Nifti1Image.
dimension n simply by storing the upper n components                 This function returns an n-dimensional numpy array.
and, if necessary, discarding the weighting coefficients                 In addition, an object of type Nifti1Image is created
(eigenvalues).                                                       from numpy arrays. To do this, one should pass an n-
    Usually, 2–3 sets of data are first concatenated. This           dimensional data array and an affine transformation
data set is then fed into an m-dimensional PCA and                   matrix to the Nifti1Image constructor must.
following matrix is obtained:                              .             Nitime[17] is a library for the analysis of time series
Each vector is multiplied by its own value. The                      in the field of neuroimaging. Nitime can be used to
eigenvalues characterize the importance of the                       represent, process and analyze time series data from
component here, so statistical information is not lost.              experimental data. The main purpose of the library is to
becomes the current evaluation of the group set and can              serve as a platform for analyzing data collected in
be considered as a matrix of pseudo-series consisting of             neurophysical experiments. The basic principle of nitime
m time points and v voxels. For each data set of each                implementation is the division of time series
subject, we gradually update       by combining        with          representation and time series analysis.
each data set     and applying the ICA to get the updated                An important feature of the nitime library is lazy
    , saving only m main components. Thus, the variance              initialization. Most attributes of both time series and
of each batch of data is preserved (see Fig. 4).                     analysis objects are used only when necessary. That is,
                                                                     the initialization of a time series object or an analysis
                                                                     object does not cause any intensive calculations. In
                                                                     addition, after the calculation starts, the object is saved
                                                                     and ensures that access to the results of the analysis will
                                                                     cause the calculation to be performed only when the
                                                                     analysis is performed for the first time. After that, the
                                                                     result of the analysis is saved for further use.
Figure 4 MELODIC Incremental Group PCA                                   One of the algorithms of the nitime library is the
    MIGP does not increase the memory requirement                    correlation analysis of brain regions. It calculates the
with an increase in the number of subjects, large matrices           correlation between one time series that represents a
are never formed, and the computation time varies                    given area of the brain, with other areas that are also
linearly with the number of objects. This is easily                  represented by a time series. To calculate the correlation
parallelized by applying the approach in parallel to                 between regions in the nitime library, there is a
subsets of entities, and then combining them using the               SeedCoherencAnalyzer function that takes two time
same approach of “concatenation and reduction”                       series inputs and returns a correlation matrix that can be
described above.                                                     used for further analysis.
                                                                         Nilearn [18] is a Python module for statistical




                                                               152
processing of neuroimaging data.                                         certain feature. Most often presented in the form of
    It uses scikit-learn module for multidimensional                     time series [20].
statistics with applications in intelligent modeling,                •   Voxel is an element of a three-dimensional image
classification, decoding, and connectivity analysis.                     containing some value.
Nilearn can work NiftiImage objects from the nibabel                 •   Independent models - a model for investigating
library.                                                                 thefunctional connectivity of the entire brain. They
    Nilearn library has great functionality for working                  are designed to search for general patterns of
with nii-images. It allows visualizing, decoding,                        functional       connectivity       between       brain
exploring the functional connectivity, and performing                    regions.Dependent models are a model for analyzing
various manipulations, such as smoothing, marking and                    the correlation of a given region of the brain.
advanced statistical analysis.                                       •   Brain connectivity – the structure of anatomical
                                                                         connections, statistical dependencies or cause-effect
    Nilearn provide CanICA method that is the ICA                        interactions between individual units within the
method for analyzing fMRI data at the group level.                       brain's nervous system [21].
Compared to other strategies, it brings a well-controlled
                                                                     •   Structural connectivity refers to a network of
group model, as well as a threshold algorithm that
                                                                         physical or structural links linking sets of neurons or
controls specificity and sensitivity with an explicit signal
                                                                         neural elements to structural biophysical features
model.
                                                                         [22].
    In order to get a time series and build a correlation            •   Functional connectivity is a statistical type of
matrix for it, nilearn provides the NiftiMapsMasker                      connection between anatomically unconnected areas
object. To create an object, one needs to specify an atlas               of the brain that have common functional properties
of the brain regions.Nilearn provides the ability to create              [7].
a correlation matrix for independent components that                 •   Effective connectivity – the combination of
iscomputed by CanICA.                                                    structural and functional connectivity. It describes
3 Ontology                                                               the networks of directions of one neural element
                                                                         over another.
    The study of neuroimaging with large amounts of                  •   The resting-state fMRI is a neural image obtained as
data represents the intersection of different areas of                   a result of an experiment when the subject was at rest
science. In order to use the same terms and concepts,                    and did not engage in active tasks.
simple ontology was developed that describes the main                •   The task fMRI is the neuro-images obtained as a
entities used in this work and a conceptual schema that                  result of the experiment, when the subject performed
defines the types of data, constraints on these data types               active actions, e. g., listened to music.
and the means of interaction between them. Ontology is
a formal specification of shared conceptualization [19].             4 Implementation
                                                                     4.1 Laboratory cluster specifications
                                                                         Virtual experiment was executed on the laboratory
                                                                     cluster (see Fig. 6). It consists of 2 master nodes and 6
                                                                     slave nodes. Each master node has 32Gb of RAM, 24
                                                                     threads and 2 Tb of disk space in RAID1. Slave nodes
                                                                     have 64Gb of RAM, 24 threads and 4 Tb of disk space
                                                                     attaches as JBOD. All the machines are connected to
                                                                     10Gbs switch.




Figure 5 Main concepts of the domain ontology
     The ontological specification of the subject area of
neuroimaging consists of the following components (see
Fig. 5):
• Neuro-image – a 3-dimensional or 4-dimensional
     image (a series of 3-dimensional images), reflecting
     the distribution of metabolic activity in different              Figure 6 Cluster Architecture
     regions of the brain in different time intervals [20].             On the cluster, the Hortonworks Data Platform
                                                                     (HDP) distribution package is installed. This distribution
•   The area of the brain is a set of voxels, sorted by a




                                                               153
represents a set of tools from the Hadoop infrastructure                 • Transformations are operations (for example,
running Apache Ambari.                                                     mapping, filtering, merging, etc.) performed over
    A distributed file system (HDFS) (Hadoop                               RDD. The result of the transformation is a new
Distributed File System) is installed file system. HDFS                    RDD containing its result.
consists of a NameNode server and DataNode servers.                      • Actions are operations (eg, reduction, count, etc.)
The NameNode server manages the namespace of the file                      that return a value that results from some
system and manages the clients' access to the data. The                    calculations in RDD.
main NameNode server is installed on the m1node and                      The cluster has Spark History Server installed on m1,
records all transactions associated with changing the file           Spark Thrift Server on m2, Livy Server on m2 and Spark
system metadata to a special file called EditLog. When rt            Clients on all nodes.
the main NameNode server is started, it reads the HDFS
                                                                         For more convenient programming on a cluster, we
image and applies all the changes to it. This is done once
                                                                     use Apache Zeppelin – a web-based notebook that allows
at startup. A similar operation is performed by the
                                                                     to conduct interactive data analytics. It supports many
Secondary NameNode, which is installed on the m2
                                                                     interpreters, including the Spark interpreter and the
machine. On machines s1-s4 DataNode servers are
                                                                     Python interpreter.
installed, which are responsible for storing the data itself
and keeping its integrity.                                               Scalability. As of algorithm used, each slave
                                                                     machine handles several independent fMRI images, so
    For the sharing, scalability and reliability of the
                                                                     scalability increases almost linearly with using more
Hadoop cluster, a resource manager YARN [5] is used.
                                                                     slave nodes. It is bounded by the network speed when
YARN offers a hierarchical approach to the cluster
                                                                     transmitting initial image data into slave memory,
infrastructure. The root of the YARN hierarchy is the
                                                                     however the transmission time is several seconds and is
ResourceManager. This daemon manages the entire
                                                                     negligible compared to processing time.
cluster and assigns applications to the underlying
computing resources. It allocates resources (computing               4.2 Workflow
resources, memory, and bandwidth) for the basic
                                                                     Workflow is depicted on Fig. 7. The program reads all
NodeManager. ResourceManager interacts with
                                                                     files from the directory, checks the validity of the format
ApplicationMaster when allocating resources and with
                                                                     (all data are compressed zip folders). After that, the
NodeManager when starting and monitoring basic
applications. ResoureManager is located on m2, and                   subject number is extracted from the file name and its
                                                                     gender is checked using an additional metadata file.
NodeManager on nodes s1–s4.
                                                                     When the gender is known, the file is unzipped to the
    Another important module for the Hadoop cluster is               corresponding folder. Inside the unzipped folder is a 4-D
the Zookeper. ZooKeeper is a server that coordinates                 image in the .nii.gz format. Using the nibabel library, the
distributed processing. It provides a distributed                    image is loaded into memory as an array of type
configuration service, a synchronization service, and a              numpy.array. From this array, a new array is created with
registry of names for distributed systems. Distributed               information about the spatial coordinates before the
applications use ZooKeeper to store and notify updates               value of the voxel. The new array is compressed by the
of important configuration information. The Zookeper                 gzip algorithm and stored in HDFS.
server is running on the m1 node.
                                                                         Due to Apache Spark limitations files larger than 2.5
    Since most of the calculations are iterative                     GB in binary format can not be loaded. In the
algorithms, Apache Spark was chosen as the                           uncompressed form, the sizeis 4.3 GB, so file needs to be
computational backbone. Apache Spark provides a fast                 compression. After compression, the file occupies just
and versatile platform for data processing. In comparison            700 MB.
with Hadoop, Spark accelerates the work of programs by
minimizing disk input-output operations.                                 Spark task is started with the following parameters:
                                                                         • num-executor=4 – number of executable entities;
    In Spark, the concept of RDD (stable distributed data                • executor-memory=25 GB – the amount of memory
set) is introduced – an unchangeable fault-tolerant                        used for one execution process;
distributed collection of objects that can be processed in               • executor-cores=2 – the number of cores used for
parallel. RDD can contain objects of any type. RDD is                      each executive entity.
created by loading an external data set or distributing a                • driver-memory=8 GB – the amount of memory used
collection from the main program (driver program). In                      for the driver process, that is, where SparkContext
RDD, two types of operations are supported:                                is initialized.




Figure 7 Workflow




                                                               154
    YARN creates on each node a container that receives                hypotheses. As a result, a binary matrix is obtained that
information from the driver. All calculations occur in two             shows the deviation or acceptance of hypotheses for each
streams. When metadata is received, a file with a                      area of the brain.
compressed binary array is loaded into memory. The
                                                                       4.3 Results
program decompresses it and converts it to a normal
array without information about the indexes. Then, using                   In total 50 male and 50 female subjects are used. All
the resulting array and affinity transformation matrix,                data is resting-state fMRI images.
Nifti1Image and the CanICA object are created with the                     Fig. 8 depicts a binary matrix of gender differences
following          parameters:          n_components=20;               in the functional connectivity of healthy middle-aged
Smoothing_fwhm=6;           N_init=10;       Threshold=3;              people. Red spots mark areas that correlate both in men
Verbose=10.                                                            and women, and blue dots indicate a lack of correlation.
    The CanICA object is passed to the Nifti1Image                     For example, this experiment shows that the upper front
object and an image consisting of 20 components is                     (Superior Frontal Gyrus) of the brain has a significant
output. This image is returned to the m1 driver. Thus,                 correlation with the insular cortex (Insular Cortex), but
each node receives a portion of the paths to the                       does not have a significant correlation with the front part
compressed images, processes them, and returns the                     (Frontal Pole) of the brain.
result to the driver. The task is executed until all the files             The independent components of averaged male
specified for analysis on the m1 driver are processed.                 subject show a greater functional connectivity compared
When the nodes complete the tasks, the driver comes                    to women. It can be seen that the main activity of the
with a list of Nibabel1Image objects that contain                      brain of men and women occurs near its cortex.
independent components. The data for all objects is
averaged and a time series is created using the
NiftiLabelsMasker object.
    A map of regions of the brain is transferred to the
constructor of the NiftiLabelsMasker object. Using the
ConnectivityMeasure object, which is created with the
correlation parameter, the correlation matrix for the brain
regions is considered.The correlation matrix for men and
women is calculated separately. After this, the Fisher
transform (z-transform) is applied to each matrix.
    After a new sample is calculated, which is obtained
as the difference between the male z_m obtained and the
female sample z_w. This sample will have a normal




                                                                        Figure 9 Averaged independent components for men
                                                                        (upper) and women (lower)
                                                                       5 Conclusion
                                                                           This paper presents distributed methods and means
                                                                       for searching gender differences in functional
                                                                       connectivity of resting-state fMRI were explored.
                                                                       Several methods for the search for functional
                                                                       connectivity of functionally magnetic resonance
                                                                       tomography of human rest are considered. To work with
                                                                       large amounts of data, machine learning methods were
                                                                       used to identify repetitive patterns and to intelligently
                                                                       reduce data. Their possibilities of parallel and distributed
                                                                       use and scaling are investigated with large amounts of
                                                                       input data. For the sake of better communication with
                                                                       domain experts the domain ontology was specified with
Figure 8 Binary matrix of functional connectivity                      main entities that describe this area and the necessary
difference                                                             links between them.
                                                                           The review of existing means of preparation and
distribution with a mathematical expectation of 0 and a                preprocessing of data on local and distributed systems is
variance of 2/(n-3). For this sample, calculates a critical            carried out. At the moment there are few libraries for
area with a significance level of 0.05 and c is corrected              working with the NIFTI format on a distributed system,
for multiple testing of the Benjamin–Hochberg                          so the input and output procedures for data were




                                                                 155
implemented in this work. To preprocess the data, we               [9]   Biswal, B.B., Mennes, M., Zuo, X.-N., Gohel, S.,
used method compositions from the nibabel and nilearn                    Kelly, C., Smith, S.M., Beckmann, C.F.,
libraries.To solve the problem, an overview of existing                  Adelstein, J.S., Buckner, R.L., Colcombe, S.,
distributed systems was made, among which the Apache                     others: Toward Discovery Science of Human Brain
Spark framework was most effective. For the                              Function. Proc. of the National Academy of
experiment, a cluster of 6 machines was taken, where the                 Sciences, 107, pp. 4734-4739 (2010)
two machines were the main nodes, and 4 the workers.           [10]      Cox, R.W., Ashburner, J., Breman, H., Fissell, K.,
On the cluster, the minimum set of programs required for                 Haselgrove, C., Holmes, C.J., Lancaster, J.L.,
the experiment, such as YARN, HDFS, ZooKeeper,                           Rex, D.E., Smith, S.M., Woodward, J.B., others: A
Spark and Zeppelin notebook was installed and                            (sort of) New Image Data Format Standard: Nifti-
configured.                                                              1. Neuroimage, 22, e1440 (2004)
    A virtual experiment was performed in a distributed        [11]      McGlade, E., Rogowska, J., Yurgelun-Todd, D.:
system. The time of this experiment was 4 hours for 400                  Sex Differences in Orbitofrontal Connectivity in
GB of data. As a result of the experiment, matrices of                   Male and Female Veterans With TBI. Brain
connectivity between the brain regions of men and                        imaging and Behavior, 9, pp. 535-549 (2015)
women were obtained, as well as a binary matrix of
                                                               [12]      Smith, S.M., Hyvärinen, A., Varoquaux, G.,
gender differences in functional connectivity.
                                                                         Miller, K.L., Beckmann, C.F.: Group-PCA for
Acknowledgments                                                          Very Large fMRI Datasets. NeuroImage, 101,
                                                                         pp. 738-749 (2014)
   This research was partially supported by the Russian
                                                               [13]      Beckmann, C.F., Smith, S.M.: Probabilistic
Foundation for Basic Research (projects 15-29-06045,
                                                                         Independent Component Analysis for Functional
16-07-01028).
                                                                         Magnetic Resonance Imaging. IEEE Transactions
References                                                               on Medical Imaging, 23, pp. 137-152 (2004)
                                                               [14]      Calhoun, V.D., Adali, T., Pearlson, G.D., Pekar, J.:
[1]   Council, N.R.: Frontiers in Massive Data Analysis.                 A Method for Making Group Inferences from
      The National Academies Press, Washington, DC                       Functional MRI Data Using Independent
      (2013)                                                             Component Analysis. Human Brain Mapping, 14,
[2]   Hey, A.J., Tansley, S., Tolle, K.M., others eds: The               pp. 140-151 (2001)
      Fourth Paradigm: Data-Intensive Scientific               [15]      Rachakonda, S., Silva, R.F., Liu, J., Calhoun, V.D.:
      Discovery. Microsoft Research Redmond, WA                          Memory Efficient PCA Methods for Large Group
      (2009)                                                             ICA. Frontiers in Neuroscience, 10 (2016)
[3]   Van Essen, D.C., Smith, S.M., Barch, D.M.,               [16]      Gorgolewski, K., Burns, C.D., Madison, C.,
      Behrens, T.E., Yacoub, E., Ugurbil, K.,                            Clark, D., Halchenko, Y.O., Waskom, M.L.,
      Consortium, W.-M.H., others: The WU-Minn                           Ghosh, S.S.: Nipype: A Flexible, Lightweight and
      Human Connectome Project: An Overview.                             Extensible Neuroimaging Data Processing
      Neuroimage. 80, 62–79 (2013)                                       Framework        in    Python.       Frontiers     in
[4]   Zaharia, M., Xin, R.S., Wendell, P., Das, T.,                      Neuroinformatics, 5 (2011)
      Armbrust, M., Dave, A., Meng, X., Rosen, J.,             [17]      Rokem, A., Trumpis, M., Perez, F.: Nitime: Time-
      Venkataraman, S., Franklin, M.J., others: Apache                   Series Analysis for Neuroimaging Data. In: Proc. of
      Spark: A Unified Engine for Big Data Processing.                   the 8th Python in Science Conf., pp. 68-75 (2009)
      Communications of the ACM, 59, pp. 56-65 (2016)
                                                               [18]      Abraham, A., Pedregosa, F., Eickenberg, M.,
[5]   Vavilapalli, V.K., Murthy, A.C., Douglas, C.,                      Gervais, P., Mueller, A., Kossaifi, J., Gramfort, A.,
      Agarwal, S., Konar, M., Evans, R., Graves, T.,                     Thirion, B., Varoquaux, G.: Machine Learning for
      Lowe, J., Shah, H., Seth, S., others: Apache Hadoop                Neuroimaging With Scikit-learn. Frontiers in
      yarn: Yet Another Resource Negotiator. In: Proc. of                Neuroinformatics, 8 (2014)
      the 4th annual Symposium on Cloud Computing.
      p. 5. ACM (2013)                                         [19]      Sowa, J.F., others: Knowledge Representation:
                                                                         Logical, Philosophical, and Computational
[6]   Huth, A.G., Heer, W.A. de, Griffiths, T.L.,                        Foundations. MIT Press (2000)
      Theunissen, F.E., Gallant, J.L.: Natural Speech
      Reveals the Semantic Maps that Tile Human                [20]      Poldrack, R.A.: Region of Interest Analysis for
      Cerebral Cortex. Nature, 532, pp. 453-458 (2016)                   fMRI.      Social    Cognitive     and     Affective
                                                                         Neuroscience, 2, pp. 67-70 (2007)
[7]   Friston,    K.J.:    Functional      and Effective
      Connectivity: A Review. Brain connectivity, 1,           [21]      Van Den Heuvel, M.P., Pol, H.E.H.: Exploring the
      pp. 13-36 (2011)                                                   Brain Network: A Review on Resting-State fMRI
                                                                         Functional Connectivity. European Neuropsy-
[8]   Biswal, B.B., Kylen, J.V., Hyde, J.S.: Simultaneous                chopharmacology, 20, pp. 519-534 (2010)
      Assessment of Flow and BOLD Signals in Resting-
      state Functional Connectivity Maps. NMR in               [22]      Sporns, O.: Discovering the Human Connectome.
      Biomedicine, 10, pp. 165-170 (1997)                                MIT Press (2012)




                                                             156