=Paper=
{{Paper
|id=Vol-2600/short6
|storemode=property
|title=EM-algorithm Enpowers Material Science: Application of Inverse Estimation for Small Angle Scattering
|pdfUrl=https://ceur-ws.org/Vol-2600/short6.pdf
|volume=Vol-2600
|authors=Akinori Asahara,Hidekazu Morita,Kanta Ono,Masao Yano,Tetsuya Shoji,Kotaro Saito,Chiharu Mitsumata
|dblpUrl=https://dblp.org/rec/conf/aaaiss/AsaharaMOYSSM20
}}
==EM-algorithm Enpowers Material Science: Application of Inverse Estimation for Small Angle Scattering==
[Short paper] EM-algorithm Enpowers Material Science:
Application of Inverse Estimation for Small Angle Scattering
Akinori Asahara Hidekazu Morita Kanta Ono
Hitachi Ltd. High Energy Accelerator Research Organization
Tokyo, 100-8280, Japan Tsukuba, 305-0801, Japan
Masao Yano Tetsuya Shoji Kotaro Saito Chiharu Mitsumata
Toyota Motor Corporation Paul Scherrer Institute National Institute for Materials Science
Toyota, 471-8572, Japan Villigen, 5232, Switzerland Tsukuba, 305-0047, Japan
𝑞𝑦
Abstract 0 0 0 0 1 1 0 0
0 1 1 2 0 2 0 0
In this short paper, a machine-learning algorithm is applied Neutron beam 0 4 0 2 5 3 1 1
to improve SAS (Small Angle Scattering) experimental anal- 1 0 4 5 8 0 1 0
ysis, which is commonly used in material science. In a SAS 2 0 5 8 10 2 1 1 𝑞𝑥
experiment, a particle beam incidenting to a material sample Material 0 1 1 2 5 2 1 0
Source
is scattered through the material sample. The distribution of 0 1 0 3 0 0 1 0
Detector
the scattered beam indicates information about the grain-size plane 0 0 0 1 0 0 0 0
distribution of the sample material; however, this distribution SAS pattern
needs to be inversely estimated. Therefore, a stochastic model
of the SAS experiment and EM (Expectation-Maximization)- Figure 1: SAS Experiment
algorithm to estimate the grain-size distribution in the ma-
terial sample are proposed. While existing methods require
much manual effort, the proposed EM-algorithm works au-
tomatically. Six simulation-generated datasets and two actual periments such as neutron-scattering, x-ray scattering, ion-
observed datasets were processed with the proposed method beam scattering, etc. Their difference lies just in the particles
for examination. The result show that the proposed EM-based to be scattered. The solution for the problem in SAS can be
grain-size distribution estimation method is useful for auto- expected to be applied for these experiments also. Thus, the
matically analyzing SAS data. problem is crucial enough to need to be solved.
One of the SAS-experimental objectives is to esti-
mate microscale-grain-size distributions in material sam-
Introduction ples. Neutrons detected on a plane during a SAS experiment
Materials Informatics (MI) is an information technology in- make a pattern on the plane (called SAS pattern). Material
tended for making material development faster that has been science researchers with special knowledges observe SAS
researched eagerly in recent years(National Institute of Stan- patterns carefully to find grain-size information about the
dards and Technology 2019). MI will help material science microstructure of the sample material.
researchers to discover new knowledge. Accordingly, a method to automatically estimate grain-
One such MI function is a data mining technique to find size distributions with SAS pattern data is presented in this
very small features of experimental data automatically. Tra- paper. Several existing estimation methods are based on
ditionally, material science researchers carefully inspect ex- function optimization to fit the grain-size distribution to the
perimental data to find small features because they might in- SAS pattern, which requires much effort by maerial science
dicate new knowledge. The researchers however might take researchers to adjust parameters. In contrast, our automatic
a long time to find such features or miss them. Therefore, estimation method is free from such effort because of prob-
automatic knowledge extraction from experimental data is abilistic modeling of SAS experimental processes (that is,
attracting attention of the researchers. knowledges of the experimental settings). A maximum like-
This study focuses on small-angle scattering (SAS) ex- lihood approach based on the stochastic modeling can be
periments (Higgins and Benoı̂t 1994) (Asahara et al. 2019), taken to estimate grain-size distribution without heuristic as-
which are commonly conducted for observing microstruc- sumptions. In this paper, an expectation-maximization (EM)
tures of materials. There are various similar scattering ex- algorithm applicable to the estimation is shown and exam-
ined with simulation data and actual measurement data.
Copyright c 2020 held by the author(s). In A. Martin, K. Hinkel-
mann, H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen
(Eds.), Proceedings of the AAAI 2020 Spring Symposium on Com- Problem settings
bining Machine Learning and Knowledge Engineering in Practice Small angle scattering
(AAAI-MAKE 2020). Stanford University, Palo Alto, California,
USA, March 23-25, 2020. Use permitted under Creative Commons An experimental instrument setting of SAS is illustrated in
License Attribution 4.0 International (CC BY 4.0). Figure 1. In the experiment, a particle beam incident upon
Wave number (nm-1) the solutions of the Schödinger equation can be added to-
(a) 1 gether, accordingly scattering pattern S(q) with a scattering
0.1 1 10 100
0.1 body that is derived as
Z
Scattering amplitude
0.01
(c) S(q) ∝ f (r)I(r, q)dr, (2)
0.001
where the grain-size distribution is denoted as f (r).
0.0001
0.00001 (b) Expert-knowledge-based analysis
To estimate grain-size distribution, S(q), which is the inte-
0.000001
gration of f (r)I(r, q), should be decomposed to the sum-
0.0000001 mation of I(r, q); however this is difficult. Thus, material
science researchers have tried to guess f (r) with clues from
Figure 2: SAS pattern analysis with graphs small features latent in the plot of I(r, q) as shown in Fig. 2.
The figure presents a log-log plot of a SAS pattern and it’s
domain is separated into three parts (a), (b) and (c). In (a),
the sample interacts with the microstructures therein. The di- that is q → 0, the power series of a trigonometric function
rections of the particles thus change due to the interactions. with q
2
The angle θ between a straight beam and the changed direc-
1 qr r 1 2 r
tion of the scattered beam depends on the interaction. Finally I(r, q) ' 3 3
− 2
(1 − (qr) ) = (3)
r q q 2 4
detectors arranged on a plane detect the scattered beam. The
counts of detection events form a pattern, called SAS pat- S(q) is independent from q. Thus, it converges to a constant
tern, on the plane. Thus, such microstructure causing the di- value. In (b), corresponding to I(r, q) under q → ∞, is ap-
rection changes is called a ”scattering body.” proximated as
2
The particle behavior during the scattering experiment is
1 r cos qr
modeled with a differential equation called the Schödinger I(r, q) ' 3 . (4)
r q2
equation. The solution of the Schödinger equation is a com-
plex function called a wave function, of which the squared Therefore, S(q) is derived as
Z
absolute value corresponds to the probability of detection. 1
S(q) ' 4 r2 f (r) cos2 qrdr. (5)
Because the distance L between the sample and the plane is q
large enough, the coordinate values on the plane x = (x, y) This behaves as the Fourier transform of r2 f (r) with decay-
are approximately in proportion to |x| = L sin θ ' Lθ. The ing in the fourth power of q.
probability density function (PDF) P (x) of detection corre- (c) is intermediate between (a) and (b). I(r, q) in the do-
sponds to the probability P (θ) that particle goes in the di- main is the following.
rection of θ, which is related to the microscopic structures
called grains. 1 2
I(r, q) = 6 (sin qr − qr cos qr) . (6)
As the simplest setting, imagine a case in which the grains q
are balls. Intensity I(r, q) of SAS pattern scattered by balls I(r, q) is always non-negative and I(r, q) = 0 when
of radius r is in proportion to the following I(r, q) sin qr − qr cos qr = 0. Therefore I(r, q) = 0 leads to
2 sin qr/ cos qr = tan qr = qr. Figure 3 plots each side of
1 sin qr r cos qr this equation. The horizontal axis x of the graph indicates qr.
I(r, q) ∝ I(r, q) = 3 − . (1) The blue curve represents y = tan x and the orange line rep-
r q3 q2
resents y = x. Their intersections, indicated by the circles
The q in the formula indicates a quantity called ”wave num- in the figure, correspond to points satisfying tan qr = qr,
ber,” which is the frequency of the wave function multiplied that is, I(r, q) = 0. Therefore, the zero points appear peri-
by 2π. The frequency of the wave function is three dimen- odically. Additionally local maximum points, which satisfy
sional because it is derived with the Fourier transformation sin x = 0, exist between the zero points. Thus I(r, q) oscil-
of the wave function in three dimensional space. The scatter- lates and it’s frequency depends on r. S(q), which is the sum
ing angle θ depends on the frequency, so the size of q = q of the I(r, q), involves the oscillations of various phases, so
along the vertical vector to incident beam (”q = (qx , qy )” the oscillations are gradually canceled by q becoming larger.
in Fig 1) appears in the formula. Therefore a q indicates a Hence, only the oscillation at the small-q domain is readable.
location x on the detection plane, derived from distance be- The material science researchers accordingly look for the
tween the incident beam center and the location. That is, we oscillation at the (c) domain because it gives implicit hints
can obtain actual SAS intensity corresponding into I(r, q) to understand f (r). Therefore, f (r) can be estimated only
by converting x to q. roughly. If f (r) were estimated directly, the SAS experiment
This formula is feasible in the case of a uniform grain size could give much more information of the sample. Conse-
r. However actual grain sizes vary. The SAS pattern by mul- quently, a method to directly estimate f (r) is highly needed.
tiple grain sizes is the weighted sum of I(r, q) over r and the Thus, a machine-learning-based method is proposed in this
weight is the grain-size distribution of the material, because paper.
y= tan x I(q, r)
30 f(r)
20 r q
y= x
Select r randomly Select q randomly
10
Incident determine Change
0 particle a grain direction Detected
0 5 10 15 Scattering inside the sample
-10
-20 Figure 5: Probabilistic solution of scattering problems
-30
“sin x=0” points the difference of a SAS pattern from S(q) cannot be aver-
aged enough in esthe timation of many an s. Accordingly
Figure 3: sin qr − qr cos qr behavior the higher resolution setting makes estimation error larger.
A technique to avoid this problem is to add regularization
𝑓(𝑟) 𝑎𝑛 𝜃𝑛 (𝑟) terms to suppress over fitting. However, the regularization
𝑎𝑛−1 𝜃𝑛−1 (𝑟) 𝑎𝑛+1 𝜃𝑛+1 (𝑟) terms is required to adjust manually. To automate regular-
ization, complicated methods to determine the regularization
terms have been proposed, but they are not common yet.
In this paper, an approach in which machine-learning al-
𝑟
gorithms are applied is taken against the problem. Specifi-
cally, the SAS-experimental process is modeled as a stochas-
Figure 4: Indirect Fourier Transform (IFT) tic process with latent variables. After that, a likelihood
function derived from the stochastic process is maximized
to fit the SAS pattern. As the result, the grain-size distri-
Related works bution is obtained as the optimal model parameter of the
One practicable method is parametric function fitting. Pa- stochastic process. No assumption is required for the method
rameters of the function can be adjusted to fit to the obtained if a non-parametric model (that is, a very general stochastic
SAS pattern because their relationship is known (Joachim model such as a Gaussian mixture) is applied for the SAS-
and Ingo 2018). However, for this approach, the form of experimental process. Generally an EM algorithm is applied
f (r) is required. The true f (r) is generally unknown in ac- to non-parametric models. Similarly a method using a non-
tual situations. Material scientists therefore should assume parametric model and EM algorithm is proposed.
many kinds of function forms to find the best estimation. Such techniques are used in astrophysics (William 1972)
Until the best estimation is achieved, many trials will be re- (Leon 1974), bioinformatics (Lustig et al. 2008) (Lustig,
quired, leading to a long calculation time. Donoho, and Pauly 2007) and compressed sensing (Donoho
To avoid such difficulty, a function having a more general 2006). However this kind of approach is not common in scat-
formula should be used. One technique using such function tering experiments. Therefore, in this paper, algorithms suit-
is Indirect Fourier Transform (IFT) (Otto 1977). For IFT, able for SAS are proposed and examined using simulation
summation of multiple stepwise functions θn (x) is used as and actual data.
the general function. The stepwise function θn (r) returns 1
when rn < r < rn+1 , and 0 otherwise, where the domain
of the function is separated into N small partitions rn < r <
Stochastic process of SAS
rn+1 (1, · · · n, · · · N ). Formula (2) is Approach
X Z The process consists of dispersion and observation, which
S(q) ' an θn (r)I(r, q)dr, (7) are modeled with two different probabilistic models. shown
n in Fig. 5.
Under this assumption, the integral is decomposed into def- At the first dispersion step, the incident beam interacts
inite integrations in rn < r < rn+1 . Because the definite with grains. In Fig. 5, ”determine a grain” represents the
integrals can be carried out analytically, S(q) is described process. It can be interpreted as a stochastic process in which
as a linear combination of an . After minimizing the differ- particles of the incident beam choose a scattering body in the
ence between the linear combination of an and SAS pat- sample material. The probability density function is conse-
tern, the grain-size distribution f (r) is obtained as the sum quently assumed in proportion to f (r). That is, the disper-
of an θn (r). sion step of N particles is modeled as a N -times iteration of
The resolution of the grain-size distribution is determined random sampling from f (r).
by θn in IFT as shown above. Therefore, the range of θn The second observation step, in which the incident beam
should be small to improve the resolution of grain size. Al- changes its direction and arrives at a point on the detec-
though many an s thus have to be determined for high resolu- tor plane, is also modeled as a random sampling process,
tion results, the SAS pattern must be highly accurate because shown as ”change direction” in Fig. 5. The scattered parti-
cles choose a scattering angle randomly and are detected as a Neutron beam
SAS pattern. This angle choice is stochastic due to the prin- source 𝜋 𝑙
ciple of quantum physics. Thus the probability distribution
function is in proportion to I(r, q) defined in (3). 𝑟0 𝑟1 𝑟2 𝑟3 𝑟4 𝑟5
The entire process of SAS is modeled as the combination
of these two stochastic processes. In the entire process, the 𝜂𝑙,2
size of the scattering body interacting with each particle is 𝑞0 𝑞1 𝑞2 𝑞3 𝑞4
unobservable. When both latent variables and model param-
eters are unknown, that Bayes statistics works. The probabil-
ity that q is chosen after determining r is described as a pos- Figure 6: Marginalization of grain size
terior P (q|r) in Bayes statistics. Note that P (q|r) ∝ I(r, q)
and the function to be estimated is P (r|q) because only q Algorithm 1: Estimation of grain size
is determined by the SAS pattern. These can be easily con- Input: SAS pattern intensity nk ≥ 0, wavenumber qk ≥ 0
nected with Bayes theorem: (k = 0, 1, · · · , K)
P (q|r)P (r) resolution of grain size rl ≥ 0 where (l = 0, 1, · · · , L)
P (r|q) = . (8) Output: {πl }
P (q)
N ⇐ k nk , {ηl,k } ⇐ { PI(rI(r l ,qk )
P
l ,qk )
}, {πl } ⇐ 1/L
This formula includes two new parts (P (r) and P (q)) repeat
m
though they do not cause problems. P (r) is a prior about π η
{πl } ⇐ k nNk P lπjl,k
P
ηj,k
grain choosing. It can be set uniformly when no information j
about grain size is given. Moreover, P (q) is a prior about the until convergence
wavenumber. Being independent from grain-size, P (q) will
be canceled with a normalization constant of P (r|q). Con-
sequently, P (r|q) equals P (q|r), which is in proportion to For simplicity, P (rl ) ≡ πl , P (qk |rl ) ≡ ηl,k hereafter. The
I(q, r), except the normalization constant. probability that a particle is scattered at rl and detected in the
This modeling is straightforward from a machine- kth partition is derived as πi ηl,k . To estimate the grain size
learning-based viewpoint. However from the quantum- distribution likelihood, we thus need P ({π0 , · · · πL }|qk ).
mechanics-based viewpoint, the incidenting particles are The grain-size partition in which the particle is actually
dealt with as a wave. Consequently, in the proposed ap- scattered is unobservable directly. Therefore, rl should be
proach, the model is simplified because of the aspect change marginalized as follows:
from wave-like aspect to a particle-like one.
P (qk |π0 , · · · πL )P (π0 , · · · πL )
P (π0 , · · · πL |qk ) =
One-particle model P (qk )
X
The formula about the scattering process of one particle ∝ P (qk |rl )P (rl |π0 , · · · πL )
should be precisely discussed as detailed above. The first l
process is to decide the grain causing scattering. The grain X
size r is continuous in domain 0 < r < R. However, as = πi ηl,k , (11)
mentioned above, it is separated into the L small partitions l
labeled by 0 · · · L−1. Assuming that the representative grain where priors P (qk ) and P (π0 , · · · πL ) are regarded as con-
size in each partition is set as the center of the partition stant parameters. Figure 6 illustrates this calculation. Even
denoted as r0 , · · · rL (that is rn+1 = rn + R/L), we can after a particle is detected at q2 , their possible paths are non-
write the grain size frequency as f (r0 ), · · · f (rL−1 ). As the unique. Therefore the likelihood of the scatting process in-
stochastic process, a particle randomly chooses a grain size volves the sum of the all paths.
for scattering with probability P (ri ) ∝ f (ri ). Accordingly,
N-particles model
f (rl )
P (rl ) = P . (9) Although the likelihood of the 1-detection event is for-
m f (rm ) mulated as above, an actual SAS pattern includes many
In the second process, the scattering angle is decided. detection events. Because the SAS pattern is a set of
Similarly the wavenumber domain 0 < q < Q is also sepa- counts of detection events, it is denoted as K integers:
rated into the K small partitions labeled by 0 · · · K − 1 and {n0 , ·P
· · nK }. With the total number N of the events,
the center of the partitions are denoted as qk . The probabil- N = k nk . {π0 , · · · , πL } maximizing the total likelihood
ity that the scattered particle is detected at the qk detector P (n0 , · · · nK |π0 , · · · , πL ) is required, indicating the grain-
is therefore described as P (qk |rl ), which is in proportion to size distribution.
I(rl , qk ). Although some particles will go outside of the de- For simplicity of the calculation, the following logarith-
tection plain, they are regarded as outside of the population mic likelihood is to be maximized by {πk }.
distribution to be modeled. Consequently, ln P (π0 , · · · , πL |n0 , · · · nK )
I(rl , qk )
X X X
P (qk |rl ) = P . (10) = ln N ! + nk ln πl ηl,k − ln nk ! (12)
m I(rl , qk ) k l k
However, because the πk s are P probabilities of the random EXPERIMENTS
choice, they are restricted as πk = 1. Therefore, the max- Experimental settings
imization is carried out under the constraint with the La-
grange multiplier method. Two different types of experiments were executed to evalu-
ate whether the proposed algorithm automatically estimates
∂ grain-size distribution consistent with SAS pattern. In the
ln P (π0 , · · · , πL |n0 , · · · nK )
∂πl first experiment (Experiment 1), simulation-generated data
∂ X X were processed because we can compare the results with
= nk ln πl ηl,k − β = 0, (13) ground truth. In the second experiment (Experiment 2), ac-
∂πl
k l tual SAS pattern data with naive samples were processed to
assess the actual feasibility of the proposed algorithm.
where β is the Lagrange multiplier. This leads to the follow-
The two types of data were processed with the proposed
ing L equations,
algorithm, and IFT for comparison. For the proposed algo-
∂ X X rithm, 10,000 iterations of the EM algorithm were carried
nk ln πl ηl,k − β out instead of checking convergence. That is because the
∂πj
k l processing time is limited in an experiment but is unlim-
X ηj,k ited until convergence. The processing time is expected to
= nk P − β = 0. (14)
k l πl ηl,k be limited when the iterations are limited.
The IFT executed in the experiments involves the L1 and
After π is multiplied to both sides of the equations and the L2 regularization. The weight parameters of the regulariza-
equations are summed, tion terms are tuned for IFT to return reasonable estimation
P result. This tuning is carried out twice, that is, for Experi-
j πj ηj,k ments 1 and 2, because the best setting depends on the total
X X
nk P −β πj = 0
l πl ηl,k
event number of the SAS pattern.
k j
X
β = nk = N. (15) Experiment 1: simulation data
k In Experiment 1, six types of grain-size distributions were
Therefore the equation defined. Each pattern is one Gamma distribution or the sum
of two Gamma distributions having the most frequent point
X ηj,k around 10nm. The grain-size distribution is discretized by
nk P =N (16)
k l πl ηl,k 0.2 nm, and its domain is set from 0 to 20 nm (i.e., 100
values), corresponding to f (r) in (2). The S(q) was cal-
should be solved to obtain {πl }. culated by evaluating integration of (2). Because S(q) in-
To solve this problem, an iteration algorithm called an dicates the probability of the detection, by multiplying the
EM-algorithm (Bishop 2006) is generally applied (Zhang detection event number to S(q), the most probable SAS pat-
1993)(Demoment 1989) (Nagata, Sugita, and Okada 2012). terns can be generated. The q of SAS pattern is also discrete
Because (10)(11) leads to and its domain is from 0.1nm−1 to 5nm−1 . For the experi-
ment, the detection event number was set as 10,000, and the
π η P (qk |rl )P (rl ) SAS patterns of the grain-size distributions were generated
P j j,k = = P (rl |qk ), (17)
l πl ηl,k P (qk ) and named Patterns 1-6.
Figures 7 and 8 show the results. In both figures, (a) plots
this part represents the probability that a particle detected at the SAS pattern by log-log plot, (b) plots the grain-size dis-
qk is scattered at rl . Therefore, the expectation
P value ml of tribution estimated by the proposed method, and (c) plots
the number of such particles is ml = k nk P (rl |qk ) when the grain-size distribution estimated by IFT for comparison.
nk particles are detected at qk . According to P (rl ) = πl , The blue lines in (b) and (c) plot the truth, i.e. the origi-
additionally, nal grain-size distribution. In (b), all estimation results are
X ηj,k mj highly similar to ground truth. In contrast, in (c), estimation
nk P = = N. (18) results are generally inaccurate.
k l πl ηl,k πj
The grain-size distribution of Pattern 1 has a small peak at
The equation can be separated into the equation to lead πl s the foot of a large peak. The two peaks should be separately
and that to lead ml s: estimated. The ML results are so accurate that the small peak
appears clearly, whereas the small peak in the IFT results is
ml X πl ηl,k difficult to recognize.
πl = ml = nk P (19)
N j πj ηj,k The grain-size distribution of Pattern 2 also has a small
k
peak, but it is located on the opposite side to that in Pattern
Consequently, E-step to obtain the expectation value ml and 1. The IFT results do not accurately estimate the small peak,
M-step to obtain {πl } with the maximal likelihood are itera- whereas the ML results do.
tively carried out to derive the solution of the equation (16). The grain-size distribution of Pattern 3 has only one peak.
Algorithm 1 lists the procedures. The IFT results of this pattern are similar to those of Pat-
(a) Input (b) ML results (c) IFT results
Pattern 1
6.00E-02 8.00E-02
q[nm-1] TRUE TRUE
0.1 1 10 5.00E-02
6.00E-02
10000 4.00E-02 estimated estimated
1000 3.00E-02 4.00E-02
100 2.00E-02
intensity
2.00E-02
10 1.00E-02
1 0.00E+00 0.00E+00
0.1 0 10 20 30 0 10 20 30
0.01 grain size [nm] grain size [nm]
Pattern 2
8.00E-02 2.00E-01
q[nm-1]
TRUE TRUE
0.1 1 10 6.00E-02 1.50E-01
10000 estimated estimated
1000 4.00E-02 1.00E-01
100
intensity
2.00E-02 5.00E-02
10
1 0.00E+00 0.00E+00
0.1 0 10 20 30 0 10 20 30
0.01 grain size [nm] grain size [nm]
Pattern 3
1.00E-01 2.00E-01
q[nm-1]
TRUE TRUE
0.1 1 10 8.00E-02 1.50E-01
10000 estimated estimated
6.00E-02
1000 1.00E-01
4.00E-02
100
intensity
2.00E-02 5.00E-02
10
1 0.00E+00 0.00E+00
0.1 0 10 20 30 0 10 20 30
0.01 grain size [nm] grain size [nm]
Figure 7: Results of Exp. 1 Pattern 1, 2, 3
tern 2. Both Patterns 2 and 3 have a large peak at a small implementation is based on Python 3.6.5 and numpy library
grain size. The small grain size corresponds to a large wave (Oliphant 2006) is used to improve efficiency of the process.
number due to I(q, r). The features are very small as shown The proposed method takes around 1.2 seconds, which
in Fig. 7 (a) because the S(q) in the high-q area decays q −4 . is much shorter than the experimental time of SAS ( for
Function-fitting-based techniques such as IFT cannot handle neutron scattering, around 20 minutes). In comparison, IFT
such small components, whereas stochastic techniques such takes around 6.0 seconds, 5 times as long as the proposed
as the proposed method take into account small probabili- method. IFT is not much slower; however, this difference
ties. can became important if material science researchers have to
One large peak in the intermediate grain-size is shown in conduct many iterations during trial-and-error experiments.
Pattern 4. Pattern 4 is so simple that the estimation is easy. This shows the proposed method is quite useful for SAS data
Indeed, both the ML and IFT results are very accurate. How- analysis.
ever, the ML results are more accurate the IFT results. According to the results, the proposed method enables the
Two comparable peaks appear closely in Pattern 5. Be- grain-size distribution to be estimated accurately. IFT makes
cause the IFT results do not detect these two peaks, one peak large errors when the grain size is small, whereas the pro-
instead appears between them. In contrast, ML results detect posed method works well for such cases. In actual situa-
both peaks accurately. tions, we cannot know whether the grain size of a sample
Three peaks are shown in Pattern 6. Similar to Pattern 5, is low (i.e., IFT applicable) or not. Therefore, IFT requires
the IFT results did not extract the three peaks, whereas the much effort by material scientist but the proposed method
ML results did. does not. This shows that the proposed method is suitable
The SAS patterns of (a) input are quite similar for hu- for automatically processing SAS patterns.
mans. Therefore, material scientists have to make an effort
to obtain their difference, which reflects radical changes in Experiment 2: actual measurements
the grain-size distribution. According to the results, the pro- In Experiment 2, SAS patterns of neutrons with a
posed method is helpful and reliable. This shows that the polystyrene ball (radius 18 nm) sample and a silica ball (ra-
SAS experiment can become more useful for observing mi- dius 25 nm) sample were examined. Figure 10 shows the
crostructures of materials. results ((a), (b) and (c) are the same as in Experiment 1).
Figure 9 plots processing time of the pattern estimation. The SAS pattern are more noisy than those of Experiment 1.
For this experiment, a computer loading Intel(R) Core(TM) The most frequent radius of (b) and (c) is around the sam-
i3-4150 CPU 3.50GHz and 11 GB RAM and Cent OS. The ple true radius. This shows that both the proposed method
(a) Input (b) ML result (c) IFT result
Pattern 4
6.00E-02 6.00E-02
q[nm-1]
5.00E-02 TRUE 5.00E-02 TRUE
0.1 1 10
4.00E-02 estimated 4.00E-02 estimated
1000 3.00E-02 3.00E-02
2.00E-02 2.00E-02
intensity
10
1.00E-02 1.00E-02
0.1 0.00E+00 0.00E+00
0 10 20 30 0 10 20 30
0.001 grain size [nm] grain size [nm]
Pattern 5
5.00E-02 1.20E-01
q[nm-1]
TRUE 1.00E-01 TRUE
0.1 1 10 4.00E-02
10000 estimated 8.00E-02 estimated
3.00E-02
1000 6.00E-02
2.00E-02
100 4.00E-02
intensity
10 1.00E-02 2.00E-02
1 0.00E+00 0.00E+00
0.1 0 10 20 30 0 10 20 30
0.01 grain size [nm] grain size [nm]
Pattern 6
3.00E-02 4.00E-02
q[nm-1]
2.50E-02 TRUE TRUE
0.1 1 10 3.00E-02
10000 2.00E-02 estimated estimated
1000 1.50E-02 2.00E-02
100 1.00E-02
intensity
1.00E-02
10 5.00E-03
1 0.00E+00 0.00E+00
0.1 0 10 20 30 0 10 20 30
0.01 grain size [nm] grain size [nm]
Figure 8: Results of Exp. 1 Patterns 4, 5, 6
6 5.07 5.23 5.36 curately estimate the original grain-size distribution from
Processing time [sec]
4.76
5 4.1 4.13 SAS patterns. Moreover, the proposed method does not re-
4 quire parameter tuning to obtain good results, whereas the
3 existing method ( Indirect Fourier Transform ) does.
2 1.22 1.22 1.21 1.21 1.3 1.22
1
The stochastic model that is the base of the proposed
0
method does not assume priors. However, with priors, the es-
Pattern Pattern Pattern Pattern Pattern Pattern timation might be made more accurate and detection events
1 2 3 4 5 6 required to estimate the grain-size might be made fewer. In
ML IFT addition, non-ball scattering bodies should be taken into ac-
count. Such extensions are possible future works.
Figure 9: Comparison of processing time
References
Asahara, A.; Morita, H.; Mitsumata, C.; Ono, K.; Yano,
and IFT can be used. The difference between the ML re- M.; and Shoji, T. 2019. Early-stopping of scattering pat-
sults and IFT results is that small peaks appear at the integer- tern observation with bayesian modeling. In Proceedings of
multiplied true radius. This is considered to be because clus- the AAAI Conference on Artificial Intelligence, volume 33,
ters of the multiple balls are detected. 9410–9415.
The results show the proposed method is feasible for ac-
tual SAS pattern analysis. Moreover small material-inside Bishop, C. M. 2006. Pattern Recognition and Machine
behaviors might be observable. Thus this implies that the Learning. New York: Springer.
proposed method will extract information leading to new Demoment, G. 1989. Image reconstruction and restora-
knowledge. tion: overview of common estimation structures and prob-
lems. IEEE Transactions on Acoustics, Speech, and Signal
Conclusion and Future Works Processing 37(12):2024–2036.
An expectation-maximization (EM)-based grain-size distri- Donoho, D. L. 2006. Compressed sensing. IEEE Transac-
bution estimation method was proposed for the automati- tions on information theory 52(4):1289–1306.
cally analyzing small angle scattering (SAS) patterns. Ex- Higgins, J. S., and Benoı̂t, H. 1994. Polymers and neutron
perimental results showed that the proposed method can ac- scattering. Clarendon press Oxford.
(a) Input (b) ML results (c) IFT results
Polystyrene
q[nm-1] 0.1 2.00E-01
0.1 1 0.08 estimated 1.50E-01 estimated
1000 0.06
1.00E-01
0.04
100
intensity
0.02 5.00E-02
10 0 0.00E+00
0 50 100 150 0 50 100 150
1 grain size [nm] grain size [nm]
Silica
0.2 1.50E-01
q[nm-1]
0.1 1 0.15 estimated estimated
10000 1.00E-01
1000 0.1
5.00E-02
intensity
100 0.05
10
0 0.00E+00
1 0 50 100 150 0 50 100 150
0.1 grain size [nm] grain size [nm]
Figure 10: Results of Exp. 2
Joachim, K., and Ingo, B. 2018. SASFit. https://www.psi.
ch/en/sinq/sansi/sasfit.
Leon, B. L. 1974. An iterative technique for the rectifi-
cation of observed distributions. The astronomical journal
79(6):745–754.
Lustig, M.; Donoho, D. L.; Santos, J. M.; and Pauly, J. M.
2008. Compressed sensing mri. IEEE Signal Processing
Magazine 25(2):72–82.
Lustig, M.; Donoho, D.; and Pauly, J. M. 2007. Sparse mri:
The application of compressed sensing for rapid mr imaging.
Magnetic Resonance in Medicine 58(6):1182–1195.
Nagata, K.; Sugita, S.; and Okada, M. 2012. Bayesian spec-
tral deconvolution with the exchange monte carlo method.
Neural Networks 28:82 – 89.
National Institute of Standards and Technology. 2019. mgi.
https://www.nist.gov/mgi(viewed at Oct. 2019).
Oliphant, T. E. 2006. A guide to NumPy, volume 1. Trelgol
Publishing USA.
Otto, G. 1977. A new method for the evaluation of small-
angle scattering data. Journal of Applied Crystallography
(10):415–421.
William, Hadley, R. 1972. Bayesian-based iterative method
of image restoration. Journal of the Optical Society of Amer-
ica 62(1):55–59.
Zhang, J. 1993. The mean field theory in em procedures for
blind markov random field image restoration. IEEE Trans-
actions on Image Processing 2(1):27–40.