The Specifics of Natural Language and Ways of
       Processing It in the Computational Linguistics

                                                         Tomasz Panczyszyn
                                                  Faculty of Arts and Humanities
                                                       King’s College London
                                                     London, WC 2R 2LS, UK
                                               Email: Tomasz.Panczyszyn@gmail.com

    Abstract—The computational linguistics is now undoubtedly a            A. Related works
well-developing and prospective field of study. As an intersection
between linguistics as such and the computer science, it treats                 Natural language processing binds very strongly to the
many problems of how to process the natural language as to
                                                                           subject of artificial intelligence. The idea of creating an
make it applicable and easily transformative for the machines.
The standard question we put ourselves when it comes to the                artificial consciousness expanded stream of science fiction
interconnected areas of the natural language processing and the            literature in the nineteenth century, and rapid technological
computer science, is whether we can teach a computer how to                development aims to realize dreams of authors this type of
speak a natural language. The issue we will be thinking over in            books. Already in the forties of the twentieth century, an
the paper, will be the way of treating the matter of standard              artificial neural network model has been designed. The one
computational linguistics problems we do encounter on a daily              which has numerous applications in life today. Especially,
basis where it connects to the field of linguistics. In this paper,        neural networks [1] are used in problems of classification ([2],
a novel approach to natural language processing by using the               [3]) and categorization of components ([4], [5]). In [6], [7],
neural networks and object oriented approach is presented.                 the author’s shown inference and classification system based
                                                                           on social media.

                       I.   I NTRODUCTION                                      Another group of important methods are heuristic algo-
                                                                           rithms, which were created in order to find the maximum and
    When we try to define what the interconnections between                minimum values of optimized functions. Heuristics proved to
the natural language and the computer science really are we                be a good option for finding solutions, not only for the problem
would have to take into account the fields of study both of the            of searching the extremes of functions, but also in the graphics
disciplines regard.                                                        processing [8], [9]. An example of such applications is the
                                                                           search of important points (called key points) to 2D images
    Linguistics, being a study of human language consists of               [10], [7]. Moreover, in [11], [12], [13], the authors have shown
almost seven thousand of languages there are in the world as               that these algorithms can also be used in the construction of
the subjects of examination. It actually studies all the appearing         unique maze. A major use of heuristics is also the problem of
linguistic aspects, that is to say: syntax, pragmatics and se-             queuing ([14], [15], [16]) eg .: in online stores where overload
mantics. So a language, with all the aspects included, is being            may occur.
learned by us from a minimal input in our early childhood and
serves us to communicate with each other more or less easily.                  Natural language processing is such an important subject
A high variety of languages we do have nowadays causes also                that can not only afford to develop the field of artificial intelli-
that there emerges some questions regarding the translation                gence [17], but also help our everyday lives, eg.: lives of blind
from one language into another. The problems to be sorted                  people. Natural language processing is called parsing. One of
out in the field are pretty most frequently caused by the fact             these methods is shown in [18]. In [19], the authors presented
that the semantics of the language [applies also for semiotics,            semantic parsing using paraphrasing, again in [20] shown the
syntax and pragmatics] is being acquired by us, the users of               idea of using semantic parsing as machine translation process.
the language, quite intuitively.                                           In recent years, the idea of creating computer intelligence
    As the computers and machines do not function that way,                using chatbots gaining more and more interest in recent years.
we do meet problems like the one with how to automatically                 In such applications, an important element is the knowledge
process the construction of the possesives pair of words, i.e.             base. In [21], the authors have shown an idea of system
the photos of my friends that a user of a natural language                 for development a modular knowledge base. The authors of
would rather understand as pictures presenting the friends of              [22] presented comparison between conversations type human-
the person who speaks than (as it could appear in the computer             human and human-computer.
translation that the photos as the property of one’s firend’s).
                                                                               In these paper, I would like to present a novel approach
                                                                           to find the author of a longer text with the use of methods of
  Copyright c 2016 held by the author.                                     artificial intelligence. The proposed model has been tested and
                                                                           described with regard to all its advantages and disadvantages.

                                                                      71
                        Fig. 1: Graph of a Gaussian function for different values of the parameters σ and µ.


II.        M ULTIDIMENSIONAL LOOK AT MODERN LINQUISTICS                 language and the relations between them we can construct an
                                                                        infinite number of sentences that will be grammatically correct.
    While studying the opportuinities of solving the problems
computers do encounter when processing the languages we                     Even if we hear a sentence for the first time in our lives,
have to take into consideration also the relations between the          we can suspect or just verify whether it is correct or not. The
words in the sentences. Even if a language is being treated             phrase: The six-headed CS84 Tbs grilled the blind octopus
intuitively, we ought to remember that the standard defining the        using a MAPA mug is surely correct when it comes to its
quality of an enunciation is also the grammatical correctness.          grammar but not necessarily met by us ever before. That proves
Like in the computer languages, in the natural ones as well,            language functions intuitively.
we have combinations that are either possible or not. Let’s take
an example.                                                                  III.   M ODELS OF THEORITICAL DISCRIPTION OF
                                                                                                 LINQUISTICS
      We have a complex phrase
                                                                            We can check if the order and relation between the subject
      1)       a)   John puts on a hat every time he goes out.          and predicate is as it should be. From among many problems
                    [John = he:possible];                               of either the natural language processing and the computer
               b)   He puts on a hat every time John goes out           science issues we would like to focus just on a particular part
                    [he = John: impossible].                            of them. To see how many chances are given us by the modern
      2)       a)   Every time John goes out, he puts on a hat          computer sciences and its processing the language we will be
                    [he = John: possible];                              trying over the article to find an answer to the question of the
               b)   Every time he goes out, John puts on a hat.         classification problem. The problem we would like to sort out
                    [he=John: possible].                                is: given two presidential speeches from the US election can
                                                                        I guess with high probability whose speech it is?
    The sentences above, therefore, show us very well that                  Naturally, when it comes to the classification problem
actually every native speaker of a language has naturally, with         there is no simply answer to the question like the one above.
a quiet mininal imput from the childhood an intuition that              We can, however, assume for the purposes of our study that
tells him whether a phrase [a relation of words appearing in            both presidential candidates speak about recurrent themes, also
a sentence] could be correct and whether a grammatically and            probably using reccurent words. Even if there is no exact
formally correct sentence could ever be constructed like this.          answer to this, we will try make a few assumptions and create
                                                                        a model to try to accomplish what we suppose.
    So what is interesting about the phrases we have been
reflecting on is that there was no need of having taken                     The assumption we will be making will be based on what
syntax classes to know which of the sentences is apparently             we know for now about the presidential speeches. We will take
ungrammatical. Thanks to the knowledge we have nowadays,                into consideration two presidential speeches of Barack Obama
we are able to say that on the basis of the words of a natural          and of Mitt Romney.

                                                                   72
                         Fig. 2: Graph of an activation function for different values of the β parameters.


   So just to have an idea of how it really looks like, let’s              people’s future, the better future, creating the future of dreams,
have a look at the Obama’s speeches on future:                             future of American economy.
    ”And we salute the people of Paris for insisting this crucial
conference go on; an act of defiance that proves nothing will               IV.   P REPROCESSING OF THE ELEMENTS OF DISCOURSE
deter us from building the future we want for our children (. . . )            To prepare long enunciations for the purposes of an anal-
I want to show her passionate, idealistic young generation                 ysis of the problems of the AI, we have to represent them
that we care about their future. (. . . ) This summer, I saw               in the simpliest possible form, applying there some number
the effects of climate change firsthand in our northernmost                values. To show how that actually functions we will be serving
state, Alaska, where the sea is already swallowing villages                ourselves with the Bayes theorem which defines directly how
and eroding shorelines; where permafrost thaws and the tundra              the conditional probability works and can be seen as a way
burns, where glaciers are melting at a pace unprecedenten in               of understanding how the probability that a theory is true is
modern times. And it was a preview of one possible future.”                influenced by an evidence that appears there for the first time.
                                                                           Let’s illustrate it with a theorem
   [Barack Obama, First Session of COP 21, Nov 30, 2015].
                                                                                                                P r(w|c)P r(c)
   As we have seen in the Obama’s speech the word future                            argmaxP r(c|w) = = argmax                   =
                                                                                                                    P r(w)              (1)
appears three times. Let’s now have a look at Mitt Romney’s
                                                                                                        = argmaxP r(w|c)P r(c),
speech.
                                                                           where P r(c|w) stands for the probability of appearing of a
    ”(. . . ) They came not just in pursuit of the riches of this          particular word w in a particular class c of objects.
world but for the richness of this life. Freedom. Freedom of
religion. Freedom to speak their mind.. Freedom to build a life.              The Bayes theorem ([23], [24]) has helped us in defining
And yes, freedom to build a business. With their own hands.”               both the prior and the posterior probability of concrete objects
                                                                           appearing in a concrete class of objects. For the most frequent
   [Mitt Romney, Republican Convention, Aug 08, 2012]                      appearing words we then calculate the values of yi using the
    The assumption we can make for now is that American                    formula                             m
                                                                                                              Y
people have been hearing Romney’s speaking very frequently                                yi = argmaxP r(ci )    P r(xj |ci ),          (2)
about freedom in various contexts, like: freedom of religion,                                                  j=i
freedom to speak your mind, freedom to build a business,
freedom to build a life.                                                   where ci stands for the class of objects and the xj – for the
                                                                           analyzed expression. In the next step we do process all the
    Followingly, what is specific for Obama’s speeches is the              values by making use of the conception of blur. In the proposed
notion of future. He speaks about the future of America,                   method of processing we applied the Gaussian blur defined as

                                                                      73
Fig. 3: The model of the proposed identification system.


                          74
                                                                              In order to increase the precision of ANN, many learning
                                        −(x − µ)2
                                                   
                     1                                                    algorithms of this type of network are created. One of these
             G(x) = √   exp                              ,    (3)
                   σ 2Π                   2σ 2                            algorithms is the backpropagation algorithm, which works on
                                                                          minimizing the error function and modify weights from the
where σ is the height of the function, and µ is the shift on              output layer to the first hidden layer [31], [32], [33]. Weights
y-axis.                                                                   are modified using the following formula
   The Gaussian blur permits us approximate the values as
to assess the probability of an expression appearing in the                                      wi = wi + ∆wi ,                       (6)
demanded conditions. The calculated values allow us create
a vector representing the enunciation desired. That could be              where wi means the weight on the i-th connection, and ∆i is
showed by such a theorem                                                  calculated as
                                                                                      
               [G(y0 ), G(y1 ), . . . , G(yn−1 ), id],        (4)                     κi (1 − κi )(ωi − κi ) for output layer
                                                                                                   X
                                                                                ∆i = κi (1 − κi )      wji ∆i for hidden layer , (7)
where G(yi ) is stand for values calculated, in accordance with                       
the equation (3), and id means the author’s numerical identifier.                                  j∈κ

                                                                          where κ is the output value and ω is the expected value.
                   V.    N EURAL N ETWORK
    Already in the forties of the twentieth century, the author
of [25], [26], [27] described the first model of artificial neural
network (ANN). Artificial neural network is a mathematical
model inspired by the action of neurons in the human brain.
    ANN is composed of three types of layers – the input,
hidden and output. The input layer is responsible for the
acceptance of teaching vector, and output for return a result of
the network. Hidden layers are located between the input and
output, they are responsible for creating a deeper network in
order to obtain better results.
    Each layer is constructed of neurons, wherein each neuron
of one layer is connected to each neuron in the next layer.
Neuron is the smallest object of neural network. The data
enters the neuron through mergers which have a certain weight.
In the neuron, all input values are rescaled by the activation
function. The value after scaling is sent to other neurons along
outbound connections to the next layer. As activation function
selected a bipolar sigmoid function [28], [29], [30]– function
is defined as
                                    2
                    f (x) =                  ,                (5)
                            1 + exp(−βx)
where β is a parameter in (0, 1]. Activation function graph is
shown in Fig. 6.
                                                                           Fig. 5: Knowledge representation using Sammon mapping.


                                                                            VI.   P ROPOSED MODEL OF AUTHOR ’ S IDENTIFICATION
                                                                               The problem of a person verification on the basis of a
                                                                          longer text requires not only the extraction of the character-
                                                                          istics of his speech, but also correct classification. For this
                                                                          purpose, the proposed model consists several stages. At the
                                                                          beginning, the statement is entered into a computer, where it
                                                                          statement is processed according to Sec. IV. The next step of
                                                                          action has two paths. In the first one, the sample is stored in
                                                                          the database. When the database contains a sufficient number
                                                                          of samples, the neural network is trained using the knowledge
                                                                          contained in the database. The second path of action is to
                                                                          classify the samples by the neural network.
            Fig. 4: Error learning neural network.
                                                                             A model of such a system is shown in Fig. 3.

                                                                     75
                        VII.    E XPERIMENTS                                        [5]   R. Johnson and T. Zhang, “Effective use of word order for text
                                                                                          categorization with convolutional neural networks,” arXiv preprint
    The proposed solution has been tested by the use of 200                               arXiv:1412.1058, 2014.
samples – 100 samples per person. Each sample contained a                           [6]   A. Fornaia, C. Napoli, G. Pappalardo, and E. Tramontana, “An aop-
fragment of statements about the future, taking into account up                           rbpnn approach to infer user interests and mine contents on social
to 60 words. For the purposes of minimizing the time of neural                            media,” Intelligenza Artificiale, vol. 9, no. 2, pp. 209–219, 2015.
networks learning, each sample contained 15 components. A                           [7]   C. Napoli, G. Pappalardo, E. Tramontana, R. K. Nowicki, J. T. Star-
neural network was composed of 4 layers                                                   czewski, and M. Woźniak, “Toward work groups classification based
                                                                                          on probabilistic neural network approach,” in Artificial Intelligence and
                                                                                          Soft Computing. Springer, 2015, pp. 79–89.
       •   input layer composed of 15 neurons;
                                                                                    [8]   X.-S. Yang, “Flower pollination algorithm for global optimization,” in
       •   4 hidden layers composed of 4 neurons;                                         Unconventional computation and natural computation. Springer, 2012,
                                                                                          pp. 240–249.
       •   output layer consisting of one neuron.                                   [9]   M. Wozniak, “Fitness function for evolutionary computation applied
                                                                                          in dynamic object simulation and positioning,” in Computational In-
To train the network, the samples were divided into two                                   telligence in Vehicles and Transportation Systems (CIVTS), 2014 IEEE
groups (training and verifying – 80% : 20%). The problem of                               Symposium on. IEEE, 2014, pp. 108–114.
classification has been shown in Fig. 5 using Sammon mapping                       [10]   D. Połap, M. Woźniak, C. Napoli, E. Tramontana, and R. Damaševičius,
– based on this interpretation of the spread of knowledge it                              “Is the colony of ants able to recognize graphic objects?” in Information
can be seen that there can not be any easy way to separate the                            and Software Technologies. Springer, 2015, pp. 376–387.
samples into two groups, so the problem of classification is                       [11]   D. Polap, M. Wozniak, C. Napoli, and E. Tramontana, “Is swarm
extremely difficult. The network was trained to obtain the error                          intelligence able to create mazes?” International Journal of Electronics
                                                                                          and Telecommunications, vol. 61, no. 4, pp. 305–310, 2015.
of the 0.24 - error learning graph is shown in Fig. 4. In order to
                                                                                   [12]   D. Połap, M. Wozniak, C. Napoli, and E. Tramontana, “Real-time
verify the operation of the classifier, each sample was given to                          cloud-based game management system via cuckoo search algorithm,”
the input of the network. In consequence of the operation, the                            International Journal of Electronics and Telecommunications, vol. 61,
system indicate the author correctly for 103 samples, which                               no. 4, pp. 333–338, 2015.
results indicates in a efficiency at about 72%.                                    [13]   D. Połap, “Designing mazes for 2d games by artificial ant colony
                                                                                          algorithm,” Symposium for Young Scientists in Technology, Engineering
                                                                                          and Mathematics (SYSTEM 2015), pp. 63–70, 2016.
                        VIII.    C ONCLUSION
                                                                                   [14]   M. Woźniak, W. M. Kempa, M. Gabryel, R. K. Nowicki, and Z. Shao,
    The subject of the research was to prove the possibili-                               “On applying evolutionary computation methods to optimization of
ties of processing the natural language by the methods of                                 vacation cycle costs in finite-buffer queue,” in Artificial Intelligence
                                                                                          and Soft Computing. Springer, 2014, pp. 480–491.
computational linguistics. We have shown that, given two
                                                                                   [15]   M. Woźniak, W. M. Kempa, M. Gabryel, and R. K. Nowicki, “A finite-
different longer texts, we are able to identify their authors                             buffer queue with a single vacation policy: An analytical study with
exclusively on the basis of the words used - with a minimal                               evolutionary positioning,” International Journal of Applied Mathematics
risk rate possible (error - 0.24). After entering the data into                           and Computer Science, vol. 24, no. 4, pp. 887–900, 2014.
the computer, the use of the database’s contents needed to                         [16]   P. V. Laxmi and K. Jyothsna, “Optimization of service rate in a discrete-
be classified by the neural network. Having done so, we                                   time impatient customer queue using particle swarm optimization,” in
processed two independent author’s speeches Barack Obama’s                                Distributed Computing and Internet Technology. Springer, 2016, pp.
                                                                                          38–42.
and Mitt Romney’s addresses and therefore, helped by the
neural networks, we could evaluate whose a text is just on the                     [17]   X. Gao and N. Zhu, “Natural language processing,” Information Tech-
                                                                                          nology Journal, vol. 12, no. 17, pp. 4256–4261, 2013.
basis of the samples given and after comparing the data we
had with the one contained in the database. The outcomes then                      [18]   K. Xu, S. Zhang, Y. Feng, and D. Zhao, “Answering natural language
                                                                                          questions via phrasal semantic parsing,” in Natural Language Process-
of such an experiment are of significant help when it comes to                            ing and Chinese Computing. Springer, 2014, pp. 333–344.
the examination of idiolects and distinctive, personal styles of                   [19]   J. Berant and P. Liang, “Semantic parsing via paraphrasing.” in ACL
constructing a discourse. It – followingly – contributes to the                           (1), 2014, pp. 1415–1425.
development of the Artificial Intelligence as it is the computer                   [20]   J. Andreas, A. Vlachos, and S. Clark, “Semantic parsing as machine
to identify the authorship of a text.                                                     translation.” in ACL (2), 2013, pp. 47–52.
                                                                                   [21]   G. Pilato, A. Augello, and S. Gaglio, “A modular system oriented to
                             R EFERENCES                                                  the design of versatile knowledge bases for chatbots,” ISRN Artificial
                                                                                          Intelligence, vol. 2012, 2012.
 [1] C. Napoli and E. Tramontana, “An object-oriented neural network               [22]   J. Hill, W. R. Ford, and I. G. Farreras, “Real conversations with artificial
     toolbox based on design patterns,” in Information and Software Tech-                 intelligence: A comparison between human–human online conversations
     nologies. Springer, 2015, pp. 388–399.                                               and human–chatbot conversations,” Computers in Human Behavior,
 [2] C.-T. Chen, K.-S. Chen, and J.-S. Lee, “The use of fully polarimetric                vol. 49, pp. 245–250, 2015.
     information for the fuzzy neural classification of sar images,” Geo-
                                                                                   [23]   K.-R. Koch, Bayes Theorem.       Springer, 1990.
     science and Remote Sensing, IEEE Transactions on, vol. 41, no. 9, pp.
     2089–2100, 2003.                                                              [24]   D. L. Faigman and A. Baglioni Jr, “Bayes’ theorem in the trial process:
 [3] A. Kandaswamy, C. S. Kumar, R. P. Ramanathan, S. Jayaraman, and                      Instructing jurors on the value of statistical evidence.” Law and Human
     N. Malmurugan, “Neural classification of lung sounds using wavelet                   Behavior, vol. 12, no. 1, p. 1, 1988.
     coefficients,” Computers in Biology and Medicine, vol. 34, no. 6, pp.         [25]   K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward
     523–537, 2004.                                                                       networks are universal approximators,” Neural networks, vol. 2, no. 5,
 [4] D. Valentin, H. Abdi, and A. J. OTOOLE, “Categorization and iden-                    pp. 359–366, 1989.
     tification of human face images by neural networks: A review of the           [26]   B. Kosko, “Neural networks and fuzzy systems: a dynamical systems
     linear autoassociative and principal component approaches,” Journal of               approach to machine intelligence/book and disk,” Vol. 1Prentice hall,
     biological systems, vol. 2, no. 03, pp. 413–429, 1994.                               1992.


                                                                              76
Fig. 6: The model of the proposed artificial neural network with 4 hidden layers.


                                       77
[27]   D. F. Specht, “Probabilistic neural networks,” Neural networks, vol. 3,
       no. 1, pp. 109–118, 1990.
[28]   P. d. B. Harrington, “Sigmoid transfer functions in backpropagation
       neural networks,” Analytical Chemistry, vol. 65, no. 15, pp. 2167–2168,
       1993.
[29]   H. Yonaba, F. Anctil, and V. Fortin, “Comparing sigmoid transfer
       functions for neural network multistep ahead streamflow forecasting,”
       Journal of Hydrologic Engineering, vol. 15, no. 4, pp. 275–283, 2010.
[30]   M. Panicker and C. Babu, “Efficient fpga implementation of sigmoid
       and bipolar sigmoid activation functions for multilayer perceptrons,”
       IOSR Journal of Engineering (IOSRJEN), pp. 1352–1356, 2012.
[31]   R. Hecht-Nielsen, “Theory of the backpropagation neural network,” in
       Neural Networks, 1989. IJCNN., International Joint Conference on.
       IEEE, 1989, pp. 593–605.
[32]   M. Riedmiller and H. Braun, “A direct adaptive method for faster
       backpropagation learning: The rprop algorithm,” in Neural Networks,
       1993., IEEE International Conference on. IEEE, 1993, pp. 586–591.
[33]   P. J. Werbos, “Backpropagation through time: what it does and how to
       do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.


                                                                                 78