=Paper= {{Paper |id=Vol-1952/Integreating |storemode=property |title=Integreating Geological And Seismological Data In Point Process Models For Seismical Analysis |pdfUrl=https://ceur-ws.org/Vol-1952/Integreating.pdf |volume=Vol-1952 }} ==Integreating Geological And Seismological Data In Point Process Models For Seismical Analysis== https://ceur-ws.org/Vol-1952/Integreating.pdf

Integreating geological and seismological data in point
process models for seismical analysis

Marianna Siino Giada Adelfio
marianna.siino01@unipa.it giada.adelfio@unipa.it
Dipartimento di Scienze Economiche, Aziendali e Statistiche
Università degli studi di Palermo, Palermo, Italy

Abstract

Nowadays in the seismic and geological fields, large and complex data
sets are available. This information is a valuable source that can be
used for improving the seismic hazard assessment of a given region. In
particular, the integration of geologic variables into point process mod-
els to study seismic pattern is an open research field that has not been
fully explored. In this work, we present several open-access datasets
(the catalogue of the earthquakes, geological information such as faults,
plate boundary and the presence of volcanoes) that are properly treated
to describe the seismicity of events occurred in Greece between 2005
and 2014. We use these datasets to fit an advanced spatial point pro-
cess model for the description of interaction among the points in the
presence of larger-scale inhomogeneity.

1 Introduction
Earthquakes are the most unpredictable natural disaster and the level of damages mainly depends on the earth-
quake magnitude, depth, epicentral distance and on geology in the neighborhood area, which can generate local
amplification. An earthquake is a sudden movement of the earth lithosphere, and the two main causes of earth-
quakes are: volcanic activity and tectonic activity (in this case, events occur along the boundaries of major
tectonic plates and active faults).
Commonly, in an observed area, earthquakes can be considered as a realization of a marked space-time point
process, where the magnitude is the mark, and a point is identified by its geographical coordinates and time of
occurrence (Illian et al.; 2008). The main type of interaction structure in earthquake data is clustering. As a
matter of fact, the concentration of earthquakes in space is observed in the neighbourhood of possible sources,
such as volcanoes, faults and plate boundaries. Instead, clustering in time can be seen as a significant increase
of seismic activity immediately after large earthquakes. In the literature, the model formulation is usually based
on self-exiting point processes, such as the epidemic type aftershock sequence (ETAS) model (Ogata; 1988).
Traditionally, there are not used further information than the seismic catalogue data and so the analysis is
purely based on the distribution of events. When covariate data are also available, it is attractive and motivating
the definition and estimation of models to investigate whether the intensity depends on the covariates and to
quantify this dependence (Baddeley et al.; 2015). In particular, geologic information, such as the distance from

Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
In: A. Editor, B. Coeditor (eds.): Proceedings of the XYZ Workshop, Location, Country, DD-MMM-YYYY, published at
http://ceur-ws.org
plate boundaries or the nearest fault, can be used as covariates to provide a better estimation of the spatial
intensity and to show a correlation between seismic events and geologic data.
In Section 2, we present the seismic catalogue and the geological data sets available in the Greek area explaining
how the information have been collected, processed and cleaned for the analysis. After some descriptive analysis,
the several datasets have been integrated to show a possible usage in point process models. A hybrid of Gibbs
point process models is estimated (Baddeley et al.; 2013) to describe the spatial distribution of earthquakes
(with a magnitude 4) from 2005 to 2014, accounting for multiscale dependence while also including the e↵ect of
the geological covariates (Siino et al.; 2016). In Section 3, the methodological approach is explained, and some
results are presented in Section 4. Conclusive remarks will follow.

2 The study area and data description
The Hellenic area is the most seismically active area of the European-Mediterranean region having experienced
many destructive earthquakes and it is characterised by both tectonic and volcanic seismogenic sources. The
area is located at the boundary of the Africa-Eurasia convergence, Figure 1a. The compressional motion between
the African and Eurasian plates causes the subduction of the lithosphere forming the Hellenic Arc. About 150
km to the north in the southern Aegean Sea, the Hellenic volcanic arc is located. Most of the volcanoes in the
area are not active in terms of eruptions, but nevertheless they are sources of microseismic activity. Moreover,
Greece and its surroundings present local active faults.

40
*
39

*
** *
38

*
** *
*
37

* *
* *
36

*
* *
35

Study area
* Earthquakes
Earthquakes Mg>5.5
** *
34

20 21 22 23 24 25 26 27 28

(a) (b)

Figure 1: (a) Tectonic plates in the Hellenic region and red stars indicate the main volcanoes. (b)The polygonal
area is the study window (W ), points are earthquakes occured between 2005 and 2014 with a magnitude greater
than 4 and the red ones are those with a magnitude greater than 5.5.

2.1 The earthquake catalogue
A seismic catalogue contains focal parameters of earthquakes (latitude, longitude and depth), event
magnitude and time of occurrence. Several earthquake catalogues are freely accessible and the
data set used in this work comes from the Hellenic Unified Seismological Network, (HUSN,
http://www.gein.noa.gr/en/seismicity/earthquake-catalogs) that is nowadays constituted by about 150 seismic
stations adequately distributed over the area. Moreover, the HUSN is a contributor of the European Integrated
Data Archive (EIDA, http://www.orfeus-eu.org/data/eida), a data center that archives and provides access to
seismic waveform data and metadata gathered by European research infrastructures.
The Greek catalogue contains events since 1964 and we considered in our analysis the seismic catalogue since
2005, when the network was upgraded and earthquake location was sensibly improved. The study area extends
from 33.5 to 40.5 Lat. N and from 20 to 28 Long. E (Figure 1a).
The quality of a catalogue is fundamental, since it influences the quality of the seismicity analysis. An
earthquake should be considered as a volume rather than a single point, however it is represented by the focal
parameters. The accuracy and precision of focal parameters mainly depends on the magnitude and on the
number and distribution of the seismic stations. Small magnitude earthquakes are generally recorded from few
stations (only the nearest ones) and consequently, focal parameters are not estimated reliably. So earthquake
location is a↵ected by errors, but we can assume that all the errors associated to all the single events average out
each other when they are treated together. A crucial challenge is the evaluation of the catalogue completeness,
that is defined as the lowest magnitude for which 100% of the earthquakes in a space-time volume are detected.
For our subset, it is estimated at 2.4 using the several procedures proposed in Mignan and Woessner (2012).
However, the magnitude threshold is setted to 4 since we aim to study the spatial intensity of the main events
occurred in the area excluding the micro and the minor events according to a qualitative classification of the
magnitude.
The events in the study window with a magnitude greater than or equal to 4 between 2005 and 2014 are 1105.
Each seismic event will be considered as a point ui in an observed spatial point pattern individuated by its
two spatial coordinates (latitude and longitude in WGS84 coordinate system). In R, the essential components of
a point pattern object = {u1 , . . . , un } are the coordinates of the points and the observation window W that
in our case is a polygonal window (see Figure 1b). The main assumption for the analysis of a point patterns
are: point locations are measured exactly, there are not multiple points, points are mapped without omission,
there are no errors in detecting the presence of points and points could have been observed at any location in
the region W (Baddeley et al.; 2015).
Since the goal of the analysis is to describe in a proper way the spatial interactions between points, neglecting
time, in the spatial point pattern we found some duplication of points. How to deal with coincident points
depends on the goals of the analysis, however when the data have replicated points, some statistical procedures
can be severely a↵ected. In our case, we do not discard this information and we randomly shift each coincident
point by a small random distance in a random direction. For other procedures to deal with this type of data see
Baddeley et al. (2015).

2.2 Geologica information

We additionally considered GIS-based open-access geological information to understand dependence between
events and the di↵erent sources of earthquakes. In Figure 2c, the orange line is the Aegean plate boundary coming
from a digitalised global set of present plate boundaries on the Earth (Bird; 2003). The dataset (downloaded
from https://github.com/fraxen/tectonicplates) has vector information that represent the fractures of the Earth
crust and additional meta-data about the boundaries.
The Global Volcanism Program database gives coordinates and describes the physical characteristics of
Holocene volcanoes and their eruptions (http://volcano.si.edu). Moving from the West to the East, the main
volcanoes of the Hellenic Volcanic Arc are Methana, Milos, Santorini, Yali and Nisyros (Siebert and Simkin;
2014) and they are represented by points corresponding to the main volcanic craters, see Figure 2c.
Furthermore, we used the Greek Database of Seismogenic Sources (GreDaSS) that concerns tectonic and
active-fault data in Greece and its surroundings (Caputo et al.; 2013). The database consists of several layers
(graphical and metadata) on seismogenic sources that are categorised into two types: individual and composite.
In our analysis, we considered the spatial shape file of the Composite Seismogenic Sources (CSS) since they are
much more appropriate for investigating large-scale processes, Figure 2a. A composite source represents a com-
plex fault system with an unspecified number of aligned individual seismogenic sources that cannot be separated
spatially. The database provides geometric (strike, dip, width, depth) and kinematic (rake) parameters and
descriptive information (e.g. comments, latest earthquakes) associated to each source, Figure 2b. In particular
for the analysis of the point process model, we consider the upper edges of the composite sources that for the
sake of simplicity are named faults, blue lines in Figure 2c.
We transform all the previous geological information into spatial variables, Z(u) defined at all locations u 2 W
(Baddeley et al.; 2015), Df (u) distance to the nearest fault, Dv (u) distance to the nearest volcano and Dpb (u)
distance to the plate boundary. These spatial variables are treated as covariates since the research questions are
whether the the intensity depends on the geological information, and whether, after accounting for the influence
of geological data, there is evidence of spatial clustering of the earthquakes. Giving an example, the spatial data
about the faults in Figure 2c are converted into a pixel images, and the pixel value represents the distance from
that pixel to the nearest fault, and Figure 2d represents Df (u) .
40
39
38
37
36
35
34

20 21 22 23 24 25 26 27 28

(a) (b)

41
40

3.5
40
39

3
● 39

2.5
38

●

2
37

● ●
●

1.5
●
36
36

1
35
35

0.5
study area
plate boundary
34

faults
● vulcanoes
34

0
33

20 21 22 23 24 25 26 27 28 20 21 22 23 24 25 26 27 28 29

Figure 2: (a) Composite seismogenetic sources (b) Main geometric (strike, dip, width, depth) and kinematic
(rake) parameters that charcaterised a composite source. (c) Faults (upper edges of the composite sources),
plate boundary and main volcanoes in the Hellenic area. (d) Immage plot of the spatial covariate distance to
the nearest fault (Df (u)).

3 Methodology
A spatial point pattern = {u1 , . . . , un } is an unordered set of points in the region W ⇢ R2 where n( ) = n is
the number of points, |W | < 1 (Baddeley et al.; 2015). A point process model assumes that is a realisation of
a finite point process X in W without multiple points. The first-order property of X is described by the intensity
function, ⇢(u) = lim|du|!0 E(Y (du))/|du| where du is an infinitesimal region that contains the point u 2 W ,
|du| is its area and E(Y (du)) denotes the expected number of events in du. When the intensity is constant
the process is called homogeneous. In our case, we aim to describe the spatial arrangement as a function of
environmental information using the spatial variable Z(u) defined is Section 2.2.
Several functional summary statistics are used to study the second-order characteristics of a point pattern
and so measuring dependence between events. Two widely used summary statistics are the Ripley’s K-function
and the G-function.
The G-function is the cumulative distribution function of the nearest-neighbour distance at a typical point of
X . It is a function of di = d(ui , \ui ) that indicates the shortest distance from ui to the pattern \ui consisting
of all points of except ui . It is defined as G(r) = P{d(u, X \u  r)/X has a point in u} for any distance r 0
and any location u. Under the homogeneous Poisson assumption this relation holds G(r) = 1 exp( ⇢⇡r2 ).
Visual inspection of the G-function with the theoretical summary statistic of the homogeneous Poisson process
is used to study correlation in a descriptive analysis of the data.

3.1 Gibbs and Hybrid of Gibbs point process models
The class of Gibbs processes X is determined through a probability density function f : X ! [0, 1), where
X = { ⇢ W : n( ) < 1} is a set of point configurations contained in W . In the literature several Gibbs
models have been proposed such as the area-interaction, Strauss, Geyer, hard core processes. However Gibbs
processes have some drawbacks when points have a strong clustering and show spatial dependence at multiple
scales (Illian et al.; 2008; Baddeley et al.; 2015). Baddeley et al. (2013) proposes hybrid models as a general way
to generate multi-scale processes combining Gibbs processes. Given m unnormalized densities f1 , f2 , . . . , fm , the
hybrid density is defined as f ( ) = f1 ( ) ⇥ . . . ⇥ fm ( ), where the components have to respect some properties
(Baddeley et al.; 2013).
For example the density of the stationary hybrid process obtained considering m Geyer components (with
interaction ranges r1 , . . . rm and saturation parameters s1 , . . . , sm ) is
n( ) m
Y Y min(sj ,t(ui , \ui ;rj ))
n( )
f( ) = j (1)
i=1 j=1
P
where t(ui , \ui ; rj ) = i {1ku ui k  rj }. This density indicates that the spatial interaction between points
changes with the distances rj and the parameters that capture this information are the interaction parameters
j . When the sj is set to infinity, the corresponding component fj reduces to the Strauss process. Instead if s
= 0, the component reduces to the Poisson point process. If s is a finite positive number, then the interaction
parameter j may take any positive value. The fj Geyer component indicates inhibitive interaction when j  1,
and clustered when j > 1. To consider inhomogeneity in (1), the parameter is replaced by a function (ui )
that expresses a spatial trend and it can be a function of the coordinates of the points and of spatial covariates,
Z(u).
Gibbs models are fitted to data by pseudo-likelihood that is function of the Papangelou conditional intensity
(Baddeley et al.; 2015). The models are compared and assessed in terms of AIC, and graphical diagnostic plots
based on the spatial raw residuals. Furthermore, the diagnostic plots based on the residual K- and G-functions
are used to decide which component has to be added at each step to the hybrid model. Indeed, these graphs
show for which spatial distances the current model has a lack of fit in describing the interaction between points.
For example, the residual G-function for a fitted model, evaluated at a given r, is the score residual used for
testing the current model against the alternative of a Geyer saturation model with saturation parameter 1 and
interaction radius r. We use the the spatstat package (Baddeley and Turner; 2005) of R (R Development Core
Team; 2005), for fitting, prediction, simulation and validation of Hybrid models.

4 Some results of the analysis
The point pattern of seismic events that is described in Section 2.1 shows a spatial inhomogeneity and multi-
scale interactions between points, (Figure 1b). Figure 3 shows the non-parametric empirical G-function and its
corresponding envelope: the observed pattern is not a Poisson process since the empirical G-function is outside
the shadow region.
The first attempt in model estimation is to fit an inhomogeneous Poisson model with the following parametric
log-linear intensity ⇢(u) = exp{ 0 + g(u; ) + h(Dv (u), Dpb (u, Df (u); ↵)}, where g(u; ) is a second order poly-
nomial in the spatial coordinates and h(Dv (u), Dpb (u), Df (u); ↵) is a function of the spatial covariates defined
in Section 2.2. However, the assumption of an inhomogeneous Poisson point process model is inappropriate
since there is an unexplained interaction between points. The residual G-function has a specific trend that is far
from zero for short distances indicating clustering behavior between events (Figure 4a), so, we considered hybrid
models. We fitted several combinations of hybrid models and we compared nested models in terms of residual
deviance for variable selection. The final selected model is an inhomogeneous hybrid model with four Geyer
components that have interaction parameters ( j ) greater than 1 (Table 1). We identify a multi-scale clustering
between points for interpoint distances approximately less than 16 km. The variables Df (u) and Dpb (u) are
both significant and negatively related to the log spatial intensity. We found that the variable that indicates the
distance to the nearest volcano is not significant in explaining the spatial intensity, in fact the volcanic Hellenic
0.8
0.6
G (r )

0.4
0.2
0.0
0.00 0.05 0.10 0.15

Figure 3: The envelope of the G-function (shaded region) to test the Complete Spatial Randomness (CSR)
assumption. The black curve is the estimated G-function instead the red one is the function under the Poisson
assumption.

Estimate j Zval
(Intercept) 8.7863 0.45
Geyer1 (r1 =5.6km) 0.0841 1.09 *** 12.06
Geyer2 (r2 =6.7km) 0.4627 1.59 *** 8.06
Geyer3 (r3 =10km) 0.1193 1.13 *** 12.22
Geyer4 (r4 =16km) 0.1968 1.22 *** 6.51
Dpb -0.2656 *** -6.25
I(Df < 1 )Df -1.0196 *** -4.15
I( 1  Df < 2 )Df -0.4963 * -2.38
I( 2  Df < 3 )Df -0.5362 *** -4.39
I( 3  Df < 4 )Df -1.1964 * -2.54
I(Df 4 )Df -1.0059 *** -4.97
x -1.1664 * -2.42
y 0.5004 0.63
x2 0.0273 *** 5.48
xy -0.0045 -0.55
y2 -0.0065 -0.75
Note: ⇤ p<0.1; ⇤⇤ p<0.05; ⇤⇤⇤ p<0.01

Table 1: Estimated parameters of the hybrid model with four Geyer components. The log-intensity has a second-
order polynomial term in x and y. The variable Df is inserted considering a segmented linear relationship, where
{ 1 , 2 , 3 , 4 } = {0.43, 0.60, 1.21, 1.32}
.

Inhomogeneous model
Poisson Hybrid
AIC -5247.94 -7121.84
Range of raw residuals [-5.77;6.14] [-1.62;2.29]

Table 2: AIC and range of the spatial raw residuals of the fitted inhomogeneous Poisson model and the final
selected hybrid model.
arc area is mostly characterised by microseismic activity. Moving from the inhomogeneous Poisson process to
the hybrid formulation, there is an improvement in terms of AIC, it decreases by 1874. Moreover, there is a
sensible reduction of the range of the spatial raw residuals (Table 2). Finally, the residual G-function oscillates
around zero indicating that the interaction structure between the earthquakes is well described by the hybrid
model (Figure 4b).

5 Remarks
The integration of earthquake data and external geological information is a new field of research. In this work,
we describe the available seismic and geological data in the Greek area giving details on download links, quality
aspects, and some advices on how to treat and analyse these datasets on the statistical software R. Furthermore,
we present an application in the context of point process models, obtaining parameters that relate the seismic
0.06
0.5

0.04
0.4

0.02
0.3
R G (r )

R G (r )

0.00
^

^
0.2

−0.02
0.1

−0.04
0.0

−0.06
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25

r r

(a) (b)

Figure 4: (a) The residual G-function for the inhomogeneous Poisson process model. (b) The residual G-function
for the inhomogeneous hybrid model with four Geyer components.
information to the geological one (Table 1) and an estimation of the spatial intensity. These results are relevant
for the stakeholders of the analysis because indicate how the spatial intensity changes in the neighbourhood of
the several earthquake sources and provide a hazard map that characterised the seismicity of the area.
A way to improve the analysis could be to consider a spatio-temporal model formulation for the catalogue
data, such as the log-Gaussian Cox model (Møller and Waagepetersen; 2003), in which the covariate geological
information are included. Moreover, with a spatio-temporal formulation, it is possible to improve the assessment,
prevention and mitigation of impacts of this natural disaster.

Acknowledgments
This paper has been partially supported by the national grant of the Italian Ministry of Education University
and Research (MIUR) for the PRIN-2015 program (Progetti di ricerca di Rilevante Interesse Nazionale), “Prot.
20157PRZC4 - Research Project Title Complex space-time modeling and functional analysis for probabilistic
forecast of seismic events. PI: Giada Adelfio”

References
Baddeley, A., Rubak, E. and Turner, R. (2015). Spatial Point Patterns: Methodology and Applications with R,
London: Chapman and Hall/CRC Press.

Baddeley, A. and Turner, R. (2005). Spatstat: An r package for analyzing spatial point patterns, Journal of
Statistical Software 12(i06).

Baddeley, A., Turner, R., Mateu, J. and Bevan, A. (2013). Hybrids of gibbs point process models and their
implementation, Journal of Statistical Software 55(11): 1–43.

Bird, P. (2003). An updated digital model of plate boundaries, Geochemistry, Geophysics, Geosystems 4(3).

Caputo, R., Chatzipetros, A., Pavlides, S. and Sboras, S. (2013). The greek database of seismogenic sources
(gredass): state-of-the-art for northern greece, Annals of Geophysics 55(5).

Illian, J., Penttinen, A., Stoyan, H. and Stoyan, D. (2008). Statistical Analysis and Modelling of Spatial Point
Patterns, Vol. 70, John Wiley & Sons.

Mignan, A. and Woessner, J. (2012). Estimating the magnitude of completeness for earthquake catalogs, com-
munity online resource for statistical seismicity analysis, doi: 10.5078/corssa-00180805.

Møller, J. and Waagepetersen, R. P. (2003). Statistical Inference and Simulation for Spatial Point Processes,
Chapman and Hall/CRC, Boca Raton.
Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes, Journal
of the American Statistical Association 83(401): 9–27.

R Development Core Team (2005). R: A language and environment for statistical computing, R Foundation for
Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
URL: http://www.R-project.org
Siebert, L. and Simkin, T. (2014). Volcanoes of the world: an illustrated catalog of holocene volcanoes and their
eruptions, Smithsonian Institution, Global Volcanism Program Digital Information Series, GVP-3 .

Siino, M., Adelfio, G., Mateu, J., Chiodi, M. and D’Alessandro, A. (2016). Spatial pattern analysis using hybrid
models: an application to the hellenic seismicity, Stochastic Environmental Research and Risk Assessment
(DOI: 10.1007/s00477-016-1294-7).