Proceedings of the 11th International Conference on Applied Informatics
      Eger, Hungary, January 29–31, 2020, published at http://ceur-ws.org


FFT Based Airborne LIDAR Classification
    with Open3D and Numpy/Scipy

                                    Peter Szutor

                                  University of Debrecen
                                  Faculty of Informatics
                                        Hungary
                                   szppaks@gmail.com


                                        Abstract
          The classification of the airborne LIDAR (also referred to ALS) point-
      clouds is a frequently used operation. The recently used algorithms use ma-
      chine learning algorithms (like SVM), but teaching this algorithms often may
      be involved. In many instances we need to write a special application for
      working with our data. For the fast developing we need a very useful pro-
      gramming language (python+numpy is the best), and a C based point cloud
      library, for the fast execution. Open3d is a modern library for 3D data pro-
      cessing, including point clouds,meshes, RGBD pictures, it has good highly
      optimized algorithms, data types, IO functions and supports visualisation;
      and it has integration with numpy arrays, so we can easily use scipy and
      scikit modules. We present a new method for point cloud classification, with
      two clustering methods and Fast Fourier Transform. The idea is based on
      that the elevation in the point cloud clusters has different frequency domain
      depending on thesurface; the buildings shows wider spectrum then the veg-
      etation, and the ground has a very small spectrum. To make contiguous
      parts from the entire point cloud, we use DBSCAN clustering (Open3D),
      and BIRCH clustering(scikit-learn).
      Keywords: Open3d, LIDAR, point cloud classification


1. Algorithm
1.1. Introduction
The modern airborne laser scanners can save the waveform data for the points,
and captures more information from the scanned object ([1]). This data determine
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).


                                            350
what is the object (tree or building). In this case , we can use the point based
classifications. These classifications are simple and efficient; but if we have a point
cloud without reflectance values, we cannot use these ones.
    There are numerous algorithms, which can classify the point clouds with only
the 𝑥, 𝑦, 𝑧 coordinates ([2, 3]). These algorithms are based on the decision of the
height above a calculated ground level, for example The BlueMarble Geo classifica-
tion (that examines the difference between the non ground points) and the Lastools
(that make contours from points of the same height).
    The third class of the algorithms is the machine learning based algorithm, such
as the well-known software CloudCompare’s plugin, the CANUPO ([4]), that uses
SVM and LDA to classify the points. If you want to use these algorithms, you have
to train the classifier on a training set.

1.2. The FFT based algorithm
To overcome the issues mentioned above I developed an algorithm based on FFT
(Fast Fourier Transform). My classifier has six steps:
   • DBSCAN clustering
   • BIRCH clustering
   • Compute the ground levels on a grid
   • Defining the ground clusters
   • Compute FFT on the non-ground clusters
   • Classifying by the FFT values
Scikit ([6]), the popular python library offers 10 different general clustering method.
We can use the methods where do not need to specify the number of clusters, so
the K-means, Spectral, Ward algorithms are out of the box. During testing these
methods the Agglomerative and Meanshift were found inappropriate, because they
cannot handle great number of points. The fastest method was the DBSCAN ([5,
7]), and the BIRCH ([9]) worked well too. I have tried the Hierarchical DBSCAN
([8]), the advanced version of the DBSCAN, but It have made too small clusters.
    Why the FFT? The FFT is a very fast algorithm, and in this case it’s practical
to use fast methods because of the huge number of points (for example I can use
an algorithm based on standard deviation of the normals on the mesh created from
the cluster points, but it’s too compute-intensive)

1.2.1. DBSCAN clustering
The DBSCAN algorithm considers clusters as areas of high density separated by
areas of low density. Due to this rather generic view, clusters found by DBSCAN
can be of any shapes, as opposed to k-means which assumes that clusters are
convex-shaped. It’s surprisingly fast and good, but clusters do not form continuous

                                         351
surfaces. This algorithm has two input parameters, eps (distance value, two sample
minimum distance) and minimum samples (how many samples make new cluster).
On my point cloud sample I used eps=1.9 and minimum samples=30 - this settings
made the best result. (The 3D view generated by the Open3D visualiser ([5]))


                       Figure 1: DBSCAN generated clusters

    Assigning random colors to the generated clusters we can see there are clusters
with points belonging to buildings and vegetation; therefore these clusters have to
be divided into smaller ones if the clusters point density differ significantly from
the whole point cloud or it has too many points. So I need use another clustering
method based on nearest neighbour point distances. The used DBSCAN is an
Open3D built-in point cloud method, simple to use and fast. I made a test with
HDBSCAN (Hierarchical DBSCAN, az advanced version of the DBSCAN), but
this method formed two many small clusters.

1.2.2. BIRCH clustering
The Birch builds a tree called the Clustering Feature Tree (CFT) for the given data.
The Birch algorithm has two parameters, the threshold and the branching factor.
The branching factor limits the number of subclusters in a node and the threshold
limits the distance between the entering sample and the existing subclusters. This
method is not a suitable procedure for divide the whole point cloud, but it can cut
the big clusters after the DBSCAN. On my sample I used the following parameter
values: branching factor 50, nclusters None, threshold 3.9, compute labels True.
   This procedure is the part of the scipy.clustering.
   The disadvantage of these methods are that make too small clusters, and leave
non classified points. This problem leads to a failure in the FFT computing.


                                        352
                         Figure 2: Birch generated clusters


1.2.3. Compute the ground levels on a grid
Decisioning the ground level is simple: I segment the whole point cloud in an axis
oriented regular grid. The smallest Z value in a segment will the ground level, and
the x,y,z coordinate will be the minimum point.

1.2.4. Defining the ground clusters
I set the cluster to ground cluster when the difference between the closest minimum
point elevation (Z value) and the average elevation of the cluster’s points is less
then 0.4 meters, and the Z range of the cluster is less than 2 meters.

1.2.5. Compute FFT on the non-ground clusters
Computing FFT is a fast method, but it has a little handicap: this runs on val-
ues are forming a regular grid. But the clusters made from the LIDAR point
cloud have points located randomly. Thus I have to generate a 100*100 regular
grid from the points using an interpolation method; this is a disadvantage of my
classifying algorithm, because this step requires lots of computation. I use the
scipy LinearNDInterpolator module to make the grid. This interpolation is based
on the Quickhull algorithm, and generates a Delaunay triangulation, and makes a
baricentric projection on the triangles to calculate the grid values.
    After the FFT transform in the frequency domain we can be see the frequency
components; the artifical objects show higher frequencies because the sharp edges,
the vegetation shows consistent spectrum. Figure 3. shows a cluster that contains
a part of a forest. The upper graph represents the points, the middle one shows the


                                        353
interpolated grid, and the lower graph is the FFT image. The next sample on the


               Figure 3: Trees: Points, linear interpolation and FFT

right shows a building; we can see that the clustering algorithm made a fault and
joined the tree to the building, but the FFT frequencies have a wider domain than
the forest. On the left there is a small cluster with few points; the interpolation
made a poorly evaluable result.


                 (a) Pitched roof               (b) Too small cluster

                         Figure 4: Computed FFT images


                                       354
1.2.6. Classifying by the FFT values
During developing the earlier version of this algorithm I tried to compare the com-
puted FFT image with sample images (forest, flat roof, pitched roof). The standard
deviation of the two FFT that showed me the class of the cluster; but it has made
wrong results, because the scale and rotation of the sample and the examined
clusters had differences.


                         Figure 5: Sample forest FFT image


                      Figure 6: Sample pitched roof FFT image

    I found a working method: examining the width of the frequency domain. But
it has a problem, because if there is no enough points in the cluster, the FFT
generates lower frequency range. Because this I need to divide the range with the
number of the points, so I use five indicators: FFT variance/number of points; FFT
spectrum maximum and minimum; FFT standard deviation / number of points;
average of the FFT values; average elevation of the cluster’s points. On the non
classified cluster I calculate an elevation, and make it “ground” if this are bellow a
threshold value. Some clusters leave non classified (for example, the walls).

                                         355
2. Results and evaluating
I ran the algorithm on two sample point clouds. These clouds have low resolution,
and contain metropolitain and town areas. There are trees mixed with family
houses, it is hard to recognize and separate, even for the human evaluator. The
figure on the left shows a smaller area, the right one represents a bigger one. The
green color is the vegetation, the blue color denotes the ground, and the red patches
represent buildings; the luminosity indicates the confidence of the classifying: dark
red is the “I’m absolutely sure”, the light red is the “hmm, maybe it’s a building”.
I used the same parameters on both samples. On the small sample, the classifier


         (a) Classified small area                  (b) Classified large area

                          Figure 7: Computed FFT images

made building from the trees, but on the bigger area the family houses changed to
forest. But it has good result in the recognizing of the flat roofed buildings, and
recognized the ground well.


3. Conclusion
The popular clustering methods based on calculate the distances between the points
and searching the nearest neighbour. All of them use spatial indexing to accelerate
this searching, still it is a compute-intensive operation, and the execution time is
increasing logarithmically by the number of points. So, if we have large ares we


                                        356
need to segment it, and run the clustering on the tiles. The DBSCAN clustering is
fast and simple to use, easy to set the parameters, but it has limited application.
    The Birch clustering makes mistakes and is slow, and has many parameters. It
is not a feasible algorithm for the point cloud operations.
    Finally I think these clustering methods -based only on the point distances-
are not too applicable for working with ALS point clouds. An effective clustering
method should consider the elevation of the point to make a cluster. Basically,
there is no really useful clustering method in this popular environment for the ALS
LIDAR files. Examining the FFT image can give satisfying result but based on a
suitable clustering method.


References
[1] Marc Bartels,      Hong Wei (2009) Point-based classification http:
    //www.cvg.reading.ac.uk/projects/LIDAR/publications/phd_reading/bartels_
    wei_LIDAR_ML_PRRS_06.pdf
[2] BlueMarble Geo Global mapper nonground classification https://www.
    bluemarblegeo.com/knowledgebase/global-mapper-19/Lidar_Module/nonground_
    classification.htm
[3] C. Hug, P. Krzystek, W. Fuchs (2004) Advanced lidar data processing
    with LasTools https://www.researchgate.net/publication/228347125_Advanced_
    lidar_data_processing_with_LasTools
[4] Nicolas Brodu (2019) CANUPO classifier http://nicolas.brodu.net/common/
    recherche/publications/canupo.pdf
[5] Open3D documentation www.open3d.org
[6] Scipy clustering https://scikit-learn.org/stable/modules/clustering.html#
    dbscan
[7] Martin Ester, Hans-Peter Kriegel, Jiirg Sander, Xiaowei Xu (1996) A
    Density-Based Algorithm for Discovering Clusters https://www.aaai.org/Papers/
    KDD/1996/KDD96-037.pdf
[8] Ricardo J. G. B. CampelloDavoud MoulaviJoerg Sander (2013) Density-
    Based Clustering Based on Hierarchical Density Estimates https://link.springer.
    com/chapter/10.1007/978-3-642-37456-2_14
[9] Sklearn Clustering Brich documentation https://scikit-learn.org/stable/
    modules/generated/sklearn.cluster.Birch.html#sklearn.cluster.Birch


                                       357