=Paper= {{Paper |id=Vol-2600/short6 |storemode=property |title=EM-algorithm Enpowers Material Science: Application of Inverse Estimation for Small Angle Scattering |pdfUrl=https://ceur-ws.org/Vol-2600/short6.pdf |volume=Vol-2600 |authors=Akinori Asahara,Hidekazu Morita,Kanta Ono,Masao Yano,Tetsuya Shoji,Kotaro Saito,Chiharu Mitsumata |dblpUrl=https://dblp.org/rec/conf/aaaiss/AsaharaMOYSSM20 }} ==EM-algorithm Enpowers Material Science: Application of Inverse Estimation for Small Angle Scattering== https://ceur-ws.org/Vol-2600/short6.pdf
                   [Short paper] EM-algorithm Enpowers Material Science:
                  Application of Inverse Estimation for Small Angle Scattering
Akinori Asahara           Hidekazu Morita                                                                      Kanta Ono
                Hitachi Ltd.                                                           High Energy Accelerator Research Organization
           Tokyo, 100-8280, Japan                                                                Tsukuba, 305-0801, Japan

    Masao Yano           Tetsuya Shoji                        Kotaro Saito                              Chiharu Mitsumata
          Toyota Motor Corporation                       Paul Scherrer Institute             National Institute for Materials Science
           Toyota, 471-8572, Japan                     Villigen, 5232, Switzerland                 Tsukuba, 305-0047, Japan
                                                                                                                                  𝑞𝑦
                            Abstract                                                                              0   0   0   0   1    1   0   0
                                                                                                                  0   1   1   2   0    2   0   0
  In this short paper, a machine-learning algorithm is applied                       Neutron beam                 0   4   0   2   5    3   1   1
  to improve SAS (Small Angle Scattering) experimental anal-                                                      1   0   4   5   8    0   1   0
  ysis, which is commonly used in material science. In a SAS                                                      2   0   5   8   10   2   1   1   𝑞𝑥
  experiment, a particle beam incidenting to a material sample                           Material                 0   1   1   2   5    2   1   0
                                                                              Source
  is scattered through the material sample. The distribution of                                                   0   1   0   3   0    0   1   0
                                                                                                    Detector
  the scattered beam indicates information about the grain-size                                     plane         0   0   0   1   0    0   0   0

  distribution of the sample material; however, this distribution                                                         SAS pattern
  needs to be inversely estimated. Therefore, a stochastic model
  of the SAS experiment and EM (Expectation-Maximization)-                                  Figure 1: SAS Experiment
  algorithm to estimate the grain-size distribution in the ma-
  terial sample are proposed. While existing methods require
  much manual effort, the proposed EM-algorithm works au-
  tomatically. Six simulation-generated datasets and two actual        periments such as neutron-scattering, x-ray scattering, ion-
  observed datasets were processed with the proposed method            beam scattering, etc. Their difference lies just in the particles
  for examination. The result show that the proposed EM-based          to be scattered. The solution for the problem in SAS can be
  grain-size distribution estimation method is useful for auto-        expected to be applied for these experiments also. Thus, the
  matically analyzing SAS data.                                        problem is crucial enough to need to be solved.
                                                                          One of the SAS-experimental objectives is to esti-
                                                                       mate microscale-grain-size distributions in material sam-
                        Introduction                                   ples. Neutrons detected on a plane during a SAS experiment
Materials Informatics (MI) is an information technology in-            make a pattern on the plane (called SAS pattern). Material
tended for making material development faster that has been            science researchers with special knowledges observe SAS
researched eagerly in recent years(National Institute of Stan-         patterns carefully to find grain-size information about the
dards and Technology 2019). MI will help material science              microstructure of the sample material.
researchers to discover new knowledge.                                    Accordingly, a method to automatically estimate grain-
   One such MI function is a data mining technique to find             size distributions with SAS pattern data is presented in this
very small features of experimental data automatically. Tra-           paper. Several existing estimation methods are based on
ditionally, material science researchers carefully inspect ex-         function optimization to fit the grain-size distribution to the
perimental data to find small features because they might in-          SAS pattern, which requires much effort by maerial science
dicate new knowledge. The researchers however might take               researchers to adjust parameters. In contrast, our automatic
a long time to find such features or miss them. Therefore,             estimation method is free from such effort because of prob-
automatic knowledge extraction from experimental data is               abilistic modeling of SAS experimental processes (that is,
attracting attention of the researchers.                               knowledges of the experimental settings). A maximum like-
   This study focuses on small-angle scattering (SAS) ex-              lihood approach based on the stochastic modeling can be
periments (Higgins and Benoı̂t 1994) (Asahara et al. 2019),            taken to estimate grain-size distribution without heuristic as-
which are commonly conducted for observing microstruc-                 sumptions. In this paper, an expectation-maximization (EM)
tures of materials. There are various similar scattering ex-           algorithm applicable to the estimation is shown and exam-
                                                                       ined with simulation data and actual measurement data.
Copyright c 2020 held by the author(s). In A. Martin, K. Hinkel-
mann, H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen
(Eds.), Proceedings of the AAAI 2020 Spring Symposium on Com-                                   Problem settings
bining Machine Learning and Knowledge Engineering in Practice          Small angle scattering
(AAAI-MAKE 2020). Stanford University, Palo Alto, California,
USA, March 23-25, 2020. Use permitted under Creative Commons           An experimental instrument setting of SAS is illustrated in
License Attribution 4.0 International (CC BY 4.0).                     Figure 1. In the experiment, a particle beam incident upon
                                            Wave number (nm-1)                  the solutions of the Schödinger equation can be added to-
                    (a)                     1                                   gether, accordingly scattering pattern S(q) with a scattering
                           0.1                   1        10              100
                                           0.1                                  body that is derived as
                                                                                                          Z
    Scattering amplitude

                                          0.01
                                                           (c)                                    S(q) ∝ f (r)I(r, q)dr,                  (2)
                                         0.001
                                                                                where the grain-size distribution is denoted as f (r).
                                        0.0001


                                       0.00001                   (b)            Expert-knowledge-based analysis
                                                                                To estimate grain-size distribution, S(q), which is the inte-
                                      0.000001
                                                                                gration of f (r)I(r, q), should be decomposed to the sum-
                                     0.0000001                                  mation of I(r, q); however this is difficult. Thus, material
                                                                                science researchers have tried to guess f (r) with clues from
                             Figure 2: SAS pattern analysis with graphs         small features latent in the plot of I(r, q) as shown in Fig. 2.
                                                                                The figure presents a log-log plot of a SAS pattern and it’s
                                                                                domain is separated into three parts (a), (b) and (c). In (a),
the sample interacts with the microstructures therein. The di-                  that is q → 0, the power series of a trigonometric function
rections of the particles thus change due to the interactions.                  with q
                                                                                                                                    2
The angle θ between a straight beam and the changed direc-
                                                                                                      
                                                                                                    1 qr         r        1      2       r
tion of the scattered beam depends on the interaction. Finally                          I(r, q) ' 3        3
                                                                                                             −    2
                                                                                                                    (1 −    (qr)   )   =    (3)
                                                                                                   r     q      q         2              4
detectors arranged on a plane detect the scattered beam. The
counts of detection events form a pattern, called SAS pat-                      S(q) is independent from q. Thus, it converges to a constant
tern, on the plane. Thus, such microstructure causing the di-                   value. In (b), corresponding to I(r, q) under q → ∞, is ap-
rection changes is called a ”scattering body.”                                  proximated as
                                                                                                                             2
   The particle behavior during the scattering experiment is
                                                                                                                 
                                                                                                               1 r cos qr
modeled with a differential equation called the Schödinger                                       I(r, q) ' 3                    .          (4)
                                                                                                              r        q2
equation. The solution of the Schödinger equation is a com-
plex function called a wave function, of which the squared                      Therefore, S(q) is derived as
                                                                                                             Z
absolute value corresponds to the probability of detection.                                              1
                                                                                                S(q) ' 4        r2 f (r) cos2 qrdr.         (5)
Because the distance L between the sample and the plane is                                              q
large enough, the coordinate values on the plane x = (x, y)                     This behaves as the Fourier transform of r2 f (r) with decay-
are approximately in proportion to |x| = L sin θ ' Lθ. The                      ing in the fourth power of q.
probability density function (PDF) P (x) of detection corre-                       (c) is intermediate between (a) and (b). I(r, q) in the do-
sponds to the probability P (θ) that particle goes in the di-                   main is the following.
rection of θ, which is related to the microscopic structures
called grains.                                                                                           1                           2
                                                                                              I(r, q) = 6 (sin qr − qr cos qr) .            (6)
   As the simplest setting, imagine a case in which the grains                                           q
are balls. Intensity I(r, q) of SAS pattern scattered by balls                  I(r, q) is always non-negative and I(r, q) = 0 when
of radius r is in proportion to the following I(r, q)                           sin qr − qr cos qr = 0. Therefore I(r, q) = 0 leads to
                                                   2                          sin qr/ cos qr = tan qr = qr. Figure 3 plots each side of
                              1 sin qr r cos qr                                 this equation. The horizontal axis x of the graph indicates qr.
       I(r, q) ∝ I(r, q) = 3             −             . (1)                    The blue curve represents y = tan x and the orange line rep-
                             r      q3        q2
                                                                                resents y = x. Their intersections, indicated by the circles
The q in the formula indicates a quantity called ”wave num-                     in the figure, correspond to points satisfying tan qr = qr,
ber,” which is the frequency of the wave function multiplied                    that is, I(r, q) = 0. Therefore, the zero points appear peri-
by 2π. The frequency of the wave function is three dimen-                       odically. Additionally local maximum points, which satisfy
sional because it is derived with the Fourier transformation                    sin x = 0, exist between the zero points. Thus I(r, q) oscil-
of the wave function in three dimensional space. The scatter-                   lates and it’s frequency depends on r. S(q), which is the sum
ing angle θ depends on the frequency, so the size of q = q                      of the I(r, q), involves the oscillations of various phases, so
along the vertical vector to incident beam (”q = (qx , qy )”                    the oscillations are gradually canceled by q becoming larger.
in Fig 1) appears in the formula. Therefore a q indicates a                     Hence, only the oscillation at the small-q domain is readable.
location x on the detection plane, derived from distance be-                       The material science researchers accordingly look for the
tween the incident beam center and the location. That is, we                    oscillation at the (c) domain because it gives implicit hints
can obtain actual SAS intensity corresponding into I(r, q)                      to understand f (r). Therefore, f (r) can be estimated only
by converting x to q.                                                           roughly. If f (r) were estimated directly, the SAS experiment
   This formula is feasible in the case of a uniform grain size                 could give much more information of the sample. Conse-
r. However actual grain sizes vary. The SAS pattern by mul-                     quently, a method to directly estimate f (r) is highly needed.
tiple grain sizes is the weighted sum of I(r, q) over r and the                 Thus, a machine-learning-based method is proposed in this
weight is the grain-size distribution of the material, because                  paper.
                                              y= tan x                                                           I(q, r)
         30                                                                         f(r)


         20                                                                                             r                           q
                                                         y= x
                                                                                    Select r randomly           Select q randomly
         10
                                                                        Incident           determine              Change
          0                                                              particle           a grain              direction              Detected
               0             5                10         15                                  Scattering inside the sample
        -10

        -20                                                           Figure 5: Probabilistic solution of scattering problems
        -30
                                 “sin x=0” points                  the difference of a SAS pattern from S(q) cannot be aver-
                                                                   aged enough in esthe timation of many an s. Accordingly
              Figure 3: sin qr − qr cos qr behavior                the higher resolution setting makes estimation error larger.
                                                                   A technique to avoid this problem is to add regularization
              𝑓(𝑟)           𝑎𝑛 𝜃𝑛 (𝑟)                             terms to suppress over fitting. However, the regularization
               𝑎𝑛−1 𝜃𝑛−1 (𝑟)          𝑎𝑛+1 𝜃𝑛+1 (𝑟)                terms is required to adjust manually. To automate regular-
                                                                   ization, complicated methods to determine the regularization
                                                                   terms have been proposed, but they are not common yet.
                                                                      In this paper, an approach in which machine-learning al-
                                                     𝑟
                                                                   gorithms are applied is taken against the problem. Specifi-
                                                                   cally, the SAS-experimental process is modeled as a stochas-
         Figure 4: Indirect Fourier Transform (IFT)                tic process with latent variables. After that, a likelihood
                                                                   function derived from the stochastic process is maximized
                                                                   to fit the SAS pattern. As the result, the grain-size distri-
Related works                                                      bution is obtained as the optimal model parameter of the
One practicable method is parametric function fitting. Pa-         stochastic process. No assumption is required for the method
rameters of the function can be adjusted to fit to the obtained    if a non-parametric model (that is, a very general stochastic
SAS pattern because their relationship is known (Joachim           model such as a Gaussian mixture) is applied for the SAS-
and Ingo 2018). However, for this approach, the form of            experimental process. Generally an EM algorithm is applied
f (r) is required. The true f (r) is generally unknown in ac-      to non-parametric models. Similarly a method using a non-
tual situations. Material scientists therefore should assume       parametric model and EM algorithm is proposed.
many kinds of function forms to find the best estimation.             Such techniques are used in astrophysics (William 1972)
Until the best estimation is achieved, many trials will be re-     (Leon 1974), bioinformatics (Lustig et al. 2008) (Lustig,
quired, leading to a long calculation time.                        Donoho, and Pauly 2007) and compressed sensing (Donoho
   To avoid such difficulty, a function having a more general      2006). However this kind of approach is not common in scat-
formula should be used. One technique using such function          tering experiments. Therefore, in this paper, algorithms suit-
is Indirect Fourier Transform (IFT) (Otto 1977). For IFT,          able for SAS are proposed and examined using simulation
summation of multiple stepwise functions θn (x) is used as         and actual data.
the general function. The stepwise function θn (r) returns 1
when rn < r < rn+1 , and 0 otherwise, where the domain
of the function is separated into N small partitions rn < r <
                                                                                    Stochastic process of SAS
rn+1 (1, · · · n, · · · N ). Formula (2) is                        Approach
                           X Z                                     The process consists of dispersion and observation, which
                S(q) '         an θn (r)I(r, q)dr,           (7)   are modeled with two different probabilistic models. shown
                         n                                         in Fig. 5.
Under this assumption, the integral is decomposed into def-           At the first dispersion step, the incident beam interacts
inite integrations in rn < r < rn+1 . Because the definite         with grains. In Fig. 5, ”determine a grain” represents the
integrals can be carried out analytically, S(q) is described       process. It can be interpreted as a stochastic process in which
as a linear combination of an . After minimizing the differ-       particles of the incident beam choose a scattering body in the
ence between the linear combination of an and SAS pat-             sample material. The probability density function is conse-
tern, the grain-size distribution f (r) is obtained as the sum     quently assumed in proportion to f (r). That is, the disper-
of an θn (r).                                                      sion step of N particles is modeled as a N -times iteration of
   The resolution of the grain-size distribution is determined     random sampling from f (r).
by θn in IFT as shown above. Therefore, the range of θn               The second observation step, in which the incident beam
should be small to improve the resolution of grain size. Al-       changes its direction and arrives at a point on the detec-
though many an s thus have to be determined for high resolu-       tor plane, is also modeled as a random sampling process,
tion results, the SAS pattern must be highly accurate because      shown as ”change direction” in Fig. 5. The scattered parti-
cles choose a scattering angle randomly and are detected as a                                      Neutron beam
SAS pattern. This angle choice is stochastic due to the prin-                                         source    𝜋       𝑙
ciple of quantum physics. Thus the probability distribution
function is in proportion to I(r, q) defined in (3).                                      𝑟0     𝑟1   𝑟2     𝑟3    𝑟4       𝑟5
   The entire process of SAS is modeled as the combination
of these two stochastic processes. In the entire process, the                                                           𝜂𝑙,2
size of the scattering body interacting with each particle is                                   𝑞0 𝑞1 𝑞2 𝑞3 𝑞4
unobservable. When both latent variables and model param-
eters are unknown, that Bayes statistics works. The probabil-
ity that q is chosen after determining r is described as a pos-                  Figure 6: Marginalization of grain size
terior P (q|r) in Bayes statistics. Note that P (q|r) ∝ I(r, q)
and the function to be estimated is P (r|q) because only q            Algorithm 1: Estimation of grain size
is determined by the SAS pattern. These can be easily con-             Input: SAS pattern intensity nk ≥ 0, wavenumber qk ≥ 0
nected with Bayes theorem:                                               (k = 0, 1, · · · , K)
                             P (q|r)P (r)                                resolution of grain size rl ≥ 0 where (l = 0, 1, · · · , L)
                 P (r|q) =                .                   (8)      Output: {πl }
                                P (q)
                                                                         N ⇐ k nk , {ηl,k } ⇐ { PI(rI(r   l ,qk )
                                                                                P
                                                                                                              l ,qk )
                                                                                                                      }, {πl } ⇐ 1/L
   This formula includes two new parts (P (r) and P (q))                 repeat
                                                                                                        m

though they do not cause problems. P (r) is a prior about                                      π η
                                                                            {πl } ⇐ k nNk P lπjl,k
                                                                                     P
                                                                                                   ηj,k
grain choosing. It can be set uniformly when no information                                    j

about grain size is given. Moreover, P (q) is a prior about the          until convergence
wavenumber. Being independent from grain-size, P (q) will
be canceled with a normalization constant of P (r|q). Con-
sequently, P (r|q) equals P (q|r), which is in proportion to         For simplicity, P (rl ) ≡ πl , P (qk |rl ) ≡ ηl,k hereafter. The
I(q, r), except the normalization constant.                          probability that a particle is scattered at rl and detected in the
   This modeling is straightforward from a machine-                  kth partition is derived as πi ηl,k . To estimate the grain size
learning-based viewpoint. However from the quantum-                  distribution likelihood, we thus need P ({π0 , · · · πL }|qk ).
mechanics-based viewpoint, the incidenting particles are                The grain-size partition in which the particle is actually
dealt with as a wave. Consequently, in the proposed ap-              scattered is unobservable directly. Therefore, rl should be
proach, the model is simplified because of the aspect change         marginalized as follows:
from wave-like aspect to a particle-like one.
                                                                                                   P (qk |π0 , · · · πL )P (π0 , · · · πL )
                                                                       P (π0 , · · · πL |qk )    =
One-particle model                                                                                                  P (qk )
                                                                                                   X
The formula about the scattering process of one particle                                         ∝     P (qk |rl )P (rl |π0 , · · · πL )
should be precisely discussed as detailed above. The first                                              l
process is to decide the grain causing scattering. The grain                                          X
size r is continuous in domain 0 < r < R. However, as                                            =          πi ηl,k ,                    (11)
mentioned above, it is separated into the L small partitions                                            l
labeled by 0 · · · L−1. Assuming that the representative grain       where priors P (qk ) and P (π0 , · · · πL ) are regarded as con-
size in each partition is set as the center of the partition         stant parameters. Figure 6 illustrates this calculation. Even
denoted as r0 , · · · rL (that is rn+1 = rn + R/L), we can           after a particle is detected at q2 , their possible paths are non-
write the grain size frequency as f (r0 ), · · · f (rL−1 ). As the   unique. Therefore the likelihood of the scatting process in-
stochastic process, a particle randomly chooses a grain size         volves the sum of the all paths.
for scattering with probability P (ri ) ∝ f (ri ). Accordingly,
                                                                     N-particles model
                               f (rl )
                  P (rl ) = P            .                    (9)       Although the likelihood of the 1-detection event is for-
                               m f (rm )                             mulated as above, an actual SAS pattern includes many
   In the second process, the scattering angle is decided.           detection events. Because the SAS pattern is a set of
Similarly the wavenumber domain 0 < q < Q is also sepa-              counts of detection events, it is denoted as K integers:
rated into the K small partitions labeled by 0 · · · K − 1 and       {n0 , ·P
                                                                            · · nK }. With the total number N of the events,
the center of the partitions are denoted as qk . The probabil-       N = k nk . {π0 , · · · , πL } maximizing the total likelihood
ity that the scattered particle is detected at the qk detector       P (n0 , · · · nK |π0 , · · · , πL ) is required, indicating the grain-
is therefore described as P (qk |rl ), which is in proportion to     size distribution.
I(rl , qk ). Although some particles will go outside of the de-         For simplicity of the calculation, the following logarith-
tection plain, they are regarded as outside of the population        mic likelihood is to be maximized by {πk }.
distribution to be modeled. Consequently,                                        ln P (π0 , · · · , πL |n0 , · · · nK )
                              I(rl , qk )
                                                                                                X           X                X
               P (qk |rl ) = P               .               (10)                = ln N ! +          nk ln         πl ηl,k −   ln nk ! (12)
                               m I(rl , qk )                                                      k            l                 k
However, because the πk s are P probabilities of the random                           EXPERIMENTS
choice, they are restricted as πk = 1. Therefore, the max-         Experimental settings
imization is carried out under the constraint with the La-
grange multiplier method.                                          Two different types of experiments were executed to evalu-
                                                                   ate whether the proposed algorithm automatically estimates
            ∂                                                      grain-size distribution consistent with SAS pattern. In the
               ln P (π0 , · · · , πL |n0 , · · · nK )
           ∂πl                                                     first experiment (Experiment 1), simulation-generated data
            ∂ X           X                                        were processed because we can compare the results with
         =         nk ln         πl ηl,k − β = 0,          (13)    ground truth. In the second experiment (Experiment 2), ac-
           ∂πl
                      k        l                                   tual SAS pattern data with naive samples were processed to
                                                                   assess the actual feasibility of the proposed algorithm.
where β is the Lagrange multiplier. This leads to the follow-
                                                                      The two types of data were processed with the proposed
ing L equations,
                                                                   algorithm, and IFT for comparison. For the proposed algo-
               ∂ X        X                                        rithm, 10,000 iterations of the EM algorithm were carried
                    nk ln       πl ηl,k − β                        out instead of checking convergence. That is because the
             ∂πj
                  k          l                                     processing time is limited in an experiment but is unlim-
                X      ηj,k                                        ited until convergence. The processing time is expected to
             =    nk P           − β = 0.                  (14)
                  k    l πl ηl,k                                   be limited when the iterations are limited.
                                                                      The IFT executed in the experiments involves the L1 and
After π is multiplied to both sides of the equations and the       L2 regularization. The weight parameters of the regulariza-
equations are summed,                                              tion terms are tuned for IFT to return reasonable estimation
            P                                                      result. This tuning is carried out twice, that is, for Experi-
               j πj ηj,k                                           ments 1 and 2, because the best setting depends on the total
     X                      X
         nk P            −β      πj = 0
               l πl ηl,k
                                                                   event number of the SAS pattern.
      k                       j
                                          X
                                  β =         nk = N. (15)         Experiment 1: simulation data
                                                 k                 In Experiment 1, six types of grain-size distributions were
Therefore the equation                                             defined. Each pattern is one Gamma distribution or the sum
                                                                   of two Gamma distributions having the most frequent point
                X        ηj,k                                      around 10nm. The grain-size distribution is discretized by
                    nk P          =N                       (16)
                  k     l πl ηl,k                                  0.2 nm, and its domain is set from 0 to 20 nm (i.e., 100
                                                                   values), corresponding to f (r) in (2). The S(q) was cal-
should be solved to obtain {πl }.                                  culated by evaluating integration of (2). Because S(q) in-
  To solve this problem, an iteration algorithm called an          dicates the probability of the detection, by multiplying the
EM-algorithm (Bishop 2006) is generally applied (Zhang             detection event number to S(q), the most probable SAS pat-
1993)(Demoment 1989) (Nagata, Sugita, and Okada 2012).             terns can be generated. The q of SAS pattern is also discrete
Because (10)(11) leads to                                          and its domain is from 0.1nm−1 to 5nm−1 . For the experi-
                                                                   ment, the detection event number was set as 10,000, and the
        π η        P (qk |rl )P (rl )                              SAS patterns of the grain-size distributions were generated
       P j j,k =                      = P (rl |qk ),       (17)
         l πl ηl,k     P (qk )                                     and named Patterns 1-6.
                                                                      Figures 7 and 8 show the results. In both figures, (a) plots
this part represents the probability that a particle detected at   the SAS pattern by log-log plot, (b) plots the grain-size dis-
qk is scattered at rl . Therefore, the expectation
                                         P          value ml of    tribution estimated by the proposed method, and (c) plots
the number of such particles is ml = k nk P (rl |qk ) when         the grain-size distribution estimated by IFT for comparison.
nk particles are detected at qk . According to P (rl ) = πl ,      The blue lines in (b) and (c) plot the truth, i.e. the origi-
additionally,                                                      nal grain-size distribution. In (b), all estimation results are
             X           ηj,k        mj                            highly similar to ground truth. In contrast, in (c), estimation
                 nk P              =    = N.                (18)   results are generally inaccurate.
              k          l πl ηl,k   πj
                                                                      The grain-size distribution of Pattern 1 has a small peak at
The equation can be separated into the equation to lead πl s       the foot of a large peak. The two peaks should be separately
and that to lead ml s:                                             estimated. The ML results are so accurate that the small peak
                                                                   appears clearly, whereas the small peak in the IFT results is
                 ml           X         πl ηl,k                    difficult to recognize.
           πl =        ml =       nk P                 (19)
                 N                      j πj ηj,k                     The grain-size distribution of Pattern 2 also has a small
                                     k
                                                                   peak, but it is located on the opposite side to that in Pattern
Consequently, E-step to obtain the expectation value ml and        1. The IFT results do not accurately estimate the small peak,
M-step to obtain {πl } with the maximal likelihood are itera-      whereas the ML results do.
tively carried out to derive the solution of the equation (16).       The grain-size distribution of Pattern 3 has only one peak.
Algorithm 1 lists the procedures.                                  The IFT results of this pattern are similar to those of Pat-
                         (a) Input                                           (b) ML results                                         (c) IFT results
                         Pattern 1
                                                              6.00E-02                                               8.00E-02
                        q[nm-1]                                                                     TRUE                                                   TRUE
              0.1          1         10                       5.00E-02
                                                                                                                     6.00E-02
                                          10000               4.00E-02                              estimated                                              estimated
                                          1000                3.00E-02                                               4.00E-02
                                          100                 2.00E-02




                                                  intensity
                                                                                                                     2.00E-02
                                          10                  1.00E-02
                                          1                   0.00E+00                                               0.00E+00
                                          0.1                            0        10           20               30              0        10           20               30
                                          0.01                                    grain size [nm]                                        grain size [nm]

                         Pattern 2
                                                              8.00E-02                                               2.00E-01
                        q[nm-1]
                                                                                                    TRUE                                                   TRUE
              0.1          1         10                       6.00E-02                                               1.50E-01
                                          10000                                                     estimated                                              estimated
                                          1000                4.00E-02                                               1.00E-01
                                          100


                                                  intensity
                                                              2.00E-02                                               5.00E-02
                                          10
                                          1                   0.00E+00                                               0.00E+00
                                          0.1                            0        10           20               30              0        10           20               30
                                          0.01                                    grain size [nm]                                        grain size [nm]

                         Pattern 3
                                                              1.00E-01                                               2.00E-01
                        q[nm-1]
                                                                                                    TRUE                                                   TRUE
              0.1          1         10                       8.00E-02                                               1.50E-01
                                          10000                                                     estimated                                              estimated
                                                              6.00E-02
                                          1000                                                                       1.00E-01
                                                              4.00E-02
                                          100
                                                  intensity




                                                              2.00E-02                                               5.00E-02
                                          10
                                          1                   0.00E+00                                               0.00E+00
                                          0.1                            0        10           20               30              0        10           20               30
                                          0.01                                    grain size [nm]                                        grain size [nm]



                                                     Figure 7: Results of Exp. 1 Pattern 1, 2, 3


tern 2. Both Patterns 2 and 3 have a large peak at a small                                implementation is based on Python 3.6.5 and numpy library
grain size. The small grain size corresponds to a large wave                              (Oliphant 2006) is used to improve efficiency of the process.
number due to I(q, r). The features are very small as shown                                  The proposed method takes around 1.2 seconds, which
in Fig. 7 (a) because the S(q) in the high-q area decays q −4 .                           is much shorter than the experimental time of SAS ( for
Function-fitting-based techniques such as IFT cannot handle                               neutron scattering, around 20 minutes). In comparison, IFT
such small components, whereas stochastic techniques such                                 takes around 6.0 seconds, 5 times as long as the proposed
as the proposed method take into account small probabili-                                 method. IFT is not much slower; however, this difference
ties.                                                                                     can became important if material science researchers have to
   One large peak in the intermediate grain-size is shown in                              conduct many iterations during trial-and-error experiments.
Pattern 4. Pattern 4 is so simple that the estimation is easy.                            This shows the proposed method is quite useful for SAS data
Indeed, both the ML and IFT results are very accurate. How-                               analysis.
ever, the ML results are more accurate the IFT results.                                      According to the results, the proposed method enables the
   Two comparable peaks appear closely in Pattern 5. Be-                                  grain-size distribution to be estimated accurately. IFT makes
cause the IFT results do not detect these two peaks, one peak                             large errors when the grain size is small, whereas the pro-
instead appears between them. In contrast, ML results detect                              posed method works well for such cases. In actual situa-
both peaks accurately.                                                                    tions, we cannot know whether the grain size of a sample
   Three peaks are shown in Pattern 6. Similar to Pattern 5,                              is low (i.e., IFT applicable) or not. Therefore, IFT requires
the IFT results did not extract the three peaks, whereas the                              much effort by material scientist but the proposed method
ML results did.                                                                           does not. This shows that the proposed method is suitable
   The SAS patterns of (a) input are quite similar for hu-                                for automatically processing SAS patterns.
mans. Therefore, material scientists have to make an effort
to obtain their difference, which reflects radical changes in                             Experiment 2: actual measurements
the grain-size distribution. According to the results, the pro-                           In Experiment 2, SAS patterns of neutrons with a
posed method is helpful and reliable. This shows that the                                 polystyrene ball (radius 18 nm) sample and a silica ball (ra-
SAS experiment can become more useful for observing mi-                                   dius 25 nm) sample were examined. Figure 10 shows the
crostructures of materials.                                                               results ((a), (b) and (c) are the same as in Experiment 1).
   Figure 9 plots processing time of the pattern estimation.                              The SAS pattern are more noisy than those of Experiment 1.
For this experiment, a computer loading Intel(R) Core(TM)                                    The most frequent radius of (b) and (c) is around the sam-
i3-4150 CPU 3.50GHz and 11 GB RAM and Cent OS. The                                        ple true radius. This shows that both the proposed method
                                                          (a) Input                                                         (b) ML result                                         (c) IFT result
                                                          Pattern 4
                                                                                                             6.00E-02                                              6.00E-02
                                                     q[nm-1]
                                                                                                             5.00E-02                             TRUE             5.00E-02                              TRUE
                                      0.1               1                   10
                                                                                                             4.00E-02                             estimated        4.00E-02                              estimated
                                                                                 1000                        3.00E-02                                              3.00E-02
                                                                                                             2.00E-02                                              2.00E-02




                                                                                              intensity
                                                                                 10
                                                                                                             1.00E-02                                              1.00E-02
                                                                                 0.1                         0.00E+00                                              0.00E+00
                                                                                                                        0       10           20               30              0        10           20               30
                                                                                 0.001                                          grain size [nm]                                        grain size [nm]

                                                          Pattern 5
                                                                                                             5.00E-02                                              1.20E-01
                                                      q[nm-1]
                                                                                                                                                  TRUE             1.00E-01                              TRUE
                                      0.1                1                   10                              4.00E-02
                                                                                  10000                                                           estimated        8.00E-02                              estimated
                                                                                                             3.00E-02
                                                                                  1000                                                                             6.00E-02
                                                                                                             2.00E-02
                                                                                  100                                                                              4.00E-02


                                                                                              intensity
                                                                                  10                         1.00E-02                                              2.00E-02
                                                                                  1                          0.00E+00                                              0.00E+00
                                                                                  0.1                                   0       10           20               30              0        10           20               30
                                                                                  0.01                                          grain size [nm]                                        grain size [nm]

                                                          Pattern 6
                                                                                                             3.00E-02                                              4.00E-02
                                                      q[nm-1]
                                                                                                             2.50E-02                             TRUE                                                   TRUE
                                      0.1                1                   10                                                                                    3.00E-02
                                                                                  10000                      2.00E-02                             estimated                                              estimated
                                                                                  1000                       1.50E-02                                              2.00E-02
                                                                                  100                        1.00E-02
                                                                                              intensity




                                                                                                                                                                   1.00E-02
                                                                                  10                         5.00E-03
                                                                                  1                          0.00E+00                                              0.00E+00
                                                                                  0.1                                   0       10           20               30              0        10           20               30
                                                                                  0.01                                          grain size [nm]                                        grain size [nm]



                                                                                              Figure 8: Results of Exp. 1 Patterns 4, 5, 6


                                  6         5.07     5.23                   5.36                                                        curately estimate the original grain-size distribution from
          Processing time [sec]




                                                                                                            4.76
                                  5                             4.1                      4.13                                           SAS patterns. Moreover, the proposed method does not re-
                                  4                                                                                                     quire parameter tuning to obtain good results, whereas the
                                  3                                                                                                     existing method ( Indirect Fourier Transform ) does.
                                  2    1.22        1.22     1.21       1.21             1.3               1.22
                                  1
                                                                                                                                           The stochastic model that is the base of the proposed
                                  0
                                                                                                                                        method does not assume priors. However, with priors, the es-
                                       Pattern Pattern Pattern Pattern Pattern Pattern                                                  timation might be made more accurate and detection events
                                          1       2       3       4       5       6                                                     required to estimate the grain-size might be made fewer. In
                                                               ML     IFT                                                               addition, non-ball scattering bodies should be taken into ac-
                                                                                                                                        count. Such extensions are possible future works.
          Figure 9: Comparison of processing time
                                                                                                                                                                              References
                                                                                                                                        Asahara, A.; Morita, H.; Mitsumata, C.; Ono, K.; Yano,
and IFT can be used. The difference between the ML re-                                                                                  M.; and Shoji, T. 2019. Early-stopping of scattering pat-
sults and IFT results is that small peaks appear at the integer-                                                                        tern observation with bayesian modeling. In Proceedings of
multiplied true radius. This is considered to be because clus-                                                                          the AAAI Conference on Artificial Intelligence, volume 33,
ters of the multiple balls are detected.                                                                                                9410–9415.
   The results show the proposed method is feasible for ac-
tual SAS pattern analysis. Moreover small material-inside                                                                               Bishop, C. M. 2006. Pattern Recognition and Machine
behaviors might be observable. Thus this implies that the                                                                               Learning. New York: Springer.
proposed method will extract information leading to new                                                                                 Demoment, G. 1989. Image reconstruction and restora-
knowledge.                                                                                                                              tion: overview of common estimation structures and prob-
                                                                                                                                        lems. IEEE Transactions on Acoustics, Speech, and Signal
                    Conclusion and Future Works                                                                                         Processing 37(12):2024–2036.
An expectation-maximization (EM)-based grain-size distri-                                                                               Donoho, D. L. 2006. Compressed sensing. IEEE Transac-
bution estimation method was proposed for the automati-                                                                                 tions on information theory 52(4):1289–1306.
cally analyzing small angle scattering (SAS) patterns. Ex-                                                                              Higgins, J. S., and Benoı̂t, H. 1994. Polymers and neutron
perimental results showed that the proposed method can ac-                                                                              scattering. Clarendon press Oxford.
                        (a) Input                                           (b) ML results                                       (c) IFT results
                       Polystyrene
                        q[nm-1]                                   0.1                                             2.00E-01
              0.1                    1                           0.08                           estimated         1.50E-01                           estimated
                                             1000                0.06
                                                                                                                  1.00E-01
                                                                 0.04
                                             100




                                                    intensity
                                                                 0.02                                             5.00E-02

                                             10                    0                                              0.00E+00
                                                                        0        50           100           150              0        50           100           150
                                             1                                   grain size [nm]                                      grain size [nm]

                            Silica
                                                                  0.2                                             1.50E-01
                        q[nm-1]
              0.1                 1                              0.15                           estimated                                            estimated
                                         10000                                                                    1.00E-01
                                         1000                     0.1
                                                                                                                  5.00E-02


                                                    intensity
                                         100                     0.05
                                         10
                                                                   0                                              0.00E+00
                                         1                              0        50           100           150              0        50           100           150
                                         0.1                                     grain size [nm]                                      grain size [nm]



                                                                Figure 10: Results of Exp. 2


Joachim, K., and Ingo, B. 2018. SASFit. https://www.psi.
ch/en/sinq/sansi/sasfit.
Leon, B. L. 1974. An iterative technique for the rectifi-
cation of observed distributions. The astronomical journal
79(6):745–754.
Lustig, M.; Donoho, D. L.; Santos, J. M.; and Pauly, J. M.
2008. Compressed sensing mri. IEEE Signal Processing
Magazine 25(2):72–82.
Lustig, M.; Donoho, D.; and Pauly, J. M. 2007. Sparse mri:
The application of compressed sensing for rapid mr imaging.
Magnetic Resonance in Medicine 58(6):1182–1195.
Nagata, K.; Sugita, S.; and Okada, M. 2012. Bayesian spec-
tral deconvolution with the exchange monte carlo method.
Neural Networks 28:82 – 89.
National Institute of Standards and Technology. 2019. mgi.
https://www.nist.gov/mgi(viewed at Oct. 2019).
Oliphant, T. E. 2006. A guide to NumPy, volume 1. Trelgol
Publishing USA.
Otto, G. 1977. A new method for the evaluation of small-
angle scattering data. Journal of Applied Crystallography
(10):415–421.
William, Hadley, R. 1972. Bayesian-based iterative method
of image restoration. Journal of the Optical Society of Amer-
ica 62(1):55–59.
Zhang, J. 1993. The mean field theory in em procedures for
blind markov random field image restoration. IEEE Trans-
actions on Image Processing 2(1):27–40.