The Specifics of Natural Language and Ways of Processing It in the Computational Linguistics Tomasz Panczyszyn Faculty of Arts and Humanities King’s College London London, WC 2R 2LS, UK Email: Tomasz.Panczyszyn@gmail.com Abstract—The computational linguistics is now undoubtedly a A. Related works well-developing and prospective field of study. As an intersection between linguistics as such and the computer science, it treats Natural language processing binds very strongly to the many problems of how to process the natural language as to subject of artificial intelligence. The idea of creating an make it applicable and easily transformative for the machines. The standard question we put ourselves when it comes to the artificial consciousness expanded stream of science fiction interconnected areas of the natural language processing and the literature in the nineteenth century, and rapid technological computer science, is whether we can teach a computer how to development aims to realize dreams of authors this type of speak a natural language. The issue we will be thinking over in books. Already in the forties of the twentieth century, an the paper, will be the way of treating the matter of standard artificial neural network model has been designed. The one computational linguistics problems we do encounter on a daily which has numerous applications in life today. Especially, basis where it connects to the field of linguistics. In this paper, neural networks [1] are used in problems of classification ([2], a novel approach to natural language processing by using the [3]) and categorization of components ([4], [5]). In [6], [7], neural networks and object oriented approach is presented. the author’s shown inference and classification system based on social media. I. I NTRODUCTION Another group of important methods are heuristic algo- rithms, which were created in order to find the maximum and When we try to define what the interconnections between minimum values of optimized functions. Heuristics proved to the natural language and the computer science really are we be a good option for finding solutions, not only for the problem would have to take into account the fields of study both of the of searching the extremes of functions, but also in the graphics disciplines regard. processing [8], [9]. An example of such applications is the search of important points (called key points) to 2D images Linguistics, being a study of human language consists of [10], [7]. Moreover, in [11], [12], [13], the authors have shown almost seven thousand of languages there are in the world as that these algorithms can also be used in the construction of the subjects of examination. It actually studies all the appearing unique maze. A major use of heuristics is also the problem of linguistic aspects, that is to say: syntax, pragmatics and se- queuing ([14], [15], [16]) eg .: in online stores where overload mantics. So a language, with all the aspects included, is being may occur. learned by us from a minimal input in our early childhood and serves us to communicate with each other more or less easily. Natural language processing is such an important subject A high variety of languages we do have nowadays causes also that can not only afford to develop the field of artificial intelli- that there emerges some questions regarding the translation gence [17], but also help our everyday lives, eg.: lives of blind from one language into another. The problems to be sorted people. Natural language processing is called parsing. One of out in the field are pretty most frequently caused by the fact these methods is shown in [18]. In [19], the authors presented that the semantics of the language [applies also for semiotics, semantic parsing using paraphrasing, again in [20] shown the syntax and pragmatics] is being acquired by us, the users of idea of using semantic parsing as machine translation process. the language, quite intuitively. In recent years, the idea of creating computer intelligence As the computers and machines do not function that way, using chatbots gaining more and more interest in recent years. we do meet problems like the one with how to automatically In such applications, an important element is the knowledge process the construction of the possesives pair of words, i.e. base. In [21], the authors have shown an idea of system the photos of my friends that a user of a natural language for development a modular knowledge base. The authors of would rather understand as pictures presenting the friends of [22] presented comparison between conversations type human- the person who speaks than (as it could appear in the computer human and human-computer. translation that the photos as the property of one’s firend’s). In these paper, I would like to present a novel approach to find the author of a longer text with the use of methods of Copyright c 2016 held by the author. artificial intelligence. The proposed model has been tested and described with regard to all its advantages and disadvantages. 71 Fig. 1: Graph of a Gaussian function for different values of the parameters σ and µ. II. M ULTIDIMENSIONAL LOOK AT MODERN LINQUISTICS language and the relations between them we can construct an infinite number of sentences that will be grammatically correct. While studying the opportuinities of solving the problems computers do encounter when processing the languages we Even if we hear a sentence for the first time in our lives, have to take into consideration also the relations between the we can suspect or just verify whether it is correct or not. The words in the sentences. Even if a language is being treated phrase: The six-headed CS84 Tbs grilled the blind octopus intuitively, we ought to remember that the standard defining the using a MAPA mug is surely correct when it comes to its quality of an enunciation is also the grammatical correctness. grammar but not necessarily met by us ever before. That proves Like in the computer languages, in the natural ones as well, language functions intuitively. we have combinations that are either possible or not. Let’s take an example. III. M ODELS OF THEORITICAL DISCRIPTION OF LINQUISTICS We have a complex phrase We can check if the order and relation between the subject 1) a) John puts on a hat every time he goes out. and predicate is as it should be. From among many problems [John = he:possible]; of either the natural language processing and the computer b) He puts on a hat every time John goes out science issues we would like to focus just on a particular part [he = John: impossible]. of them. To see how many chances are given us by the modern 2) a) Every time John goes out, he puts on a hat computer sciences and its processing the language we will be [he = John: possible]; trying over the article to find an answer to the question of the b) Every time he goes out, John puts on a hat. classification problem. The problem we would like to sort out [he=John: possible]. is: given two presidential speeches from the US election can I guess with high probability whose speech it is? The sentences above, therefore, show us very well that Naturally, when it comes to the classification problem actually every native speaker of a language has naturally, with there is no simply answer to the question like the one above. a quiet mininal imput from the childhood an intuition that We can, however, assume for the purposes of our study that tells him whether a phrase [a relation of words appearing in both presidential candidates speak about recurrent themes, also a sentence] could be correct and whether a grammatically and probably using reccurent words. Even if there is no exact formally correct sentence could ever be constructed like this. answer to this, we will try make a few assumptions and create a model to try to accomplish what we suppose. So what is interesting about the phrases we have been reflecting on is that there was no need of having taken The assumption we will be making will be based on what syntax classes to know which of the sentences is apparently we know for now about the presidential speeches. We will take ungrammatical. Thanks to the knowledge we have nowadays, into consideration two presidential speeches of Barack Obama we are able to say that on the basis of the words of a natural and of Mitt Romney. 72 Fig. 2: Graph of an activation function for different values of the β parameters. So just to have an idea of how it really looks like, let’s people’s future, the better future, creating the future of dreams, have a look at the Obama’s speeches on future: future of American economy. ”And we salute the people of Paris for insisting this crucial conference go on; an act of defiance that proves nothing will IV. P REPROCESSING OF THE ELEMENTS OF DISCOURSE deter us from building the future we want for our children (. . . ) To prepare long enunciations for the purposes of an anal- I want to show her passionate, idealistic young generation ysis of the problems of the AI, we have to represent them that we care about their future. (. . . ) This summer, I saw in the simpliest possible form, applying there some number the effects of climate change firsthand in our northernmost values. To show how that actually functions we will be serving state, Alaska, where the sea is already swallowing villages ourselves with the Bayes theorem which defines directly how and eroding shorelines; where permafrost thaws and the tundra the conditional probability works and can be seen as a way burns, where glaciers are melting at a pace unprecedenten in of understanding how the probability that a theory is true is modern times. And it was a preview of one possible future.” influenced by an evidence that appears there for the first time. Let’s illustrate it with a theorem [Barack Obama, First Session of COP 21, Nov 30, 2015]. P r(w|c)P r(c) As we have seen in the Obama’s speech the word future argmaxP r(c|w) = = argmax = P r(w) (1) appears three times. Let’s now have a look at Mitt Romney’s = argmaxP r(w|c)P r(c), speech. where P r(c|w) stands for the probability of appearing of a ”(. . . ) They came not just in pursuit of the riches of this particular word w in a particular class c of objects. world but for the richness of this life. Freedom. Freedom of religion. Freedom to speak their mind.. Freedom to build a life. The Bayes theorem ([23], [24]) has helped us in defining And yes, freedom to build a business. With their own hands.” both the prior and the posterior probability of concrete objects appearing in a concrete class of objects. For the most frequent [Mitt Romney, Republican Convention, Aug 08, 2012] appearing words we then calculate the values of yi using the The assumption we can make for now is that American formula m Y people have been hearing Romney’s speaking very frequently yi = argmaxP r(ci ) P r(xj |ci ), (2) about freedom in various contexts, like: freedom of religion, j=i freedom to speak your mind, freedom to build a business, freedom to build a life. where ci stands for the class of objects and the xj – for the analyzed expression. In the next step we do process all the Followingly, what is specific for Obama’s speeches is the values by making use of the conception of blur. In the proposed notion of future. He speaks about the future of America, method of processing we applied the Gaussian blur defined as 73 Fig. 3: The model of the proposed identification system. 74 In order to increase the precision of ANN, many learning −(x − µ)2   1 algorithms of this type of network are created. One of these G(x) = √ exp , (3) σ 2Π 2σ 2 algorithms is the backpropagation algorithm, which works on minimizing the error function and modify weights from the where σ is the height of the function, and µ is the shift on output layer to the first hidden layer [31], [32], [33]. Weights y-axis. are modified using the following formula The Gaussian blur permits us approximate the values as to assess the probability of an expression appearing in the wi = wi + ∆wi , (6) demanded conditions. The calculated values allow us create a vector representing the enunciation desired. That could be where wi means the weight on the i-th connection, and ∆i is showed by such a theorem calculated as  [G(y0 ), G(y1 ), . . . , G(yn−1 ), id], (4) κi (1 − κi )(ωi − κi ) for output layer X ∆i = κi (1 − κi ) wji ∆i for hidden layer , (7) where G(yi ) is stand for values calculated, in accordance with  the equation (3), and id means the author’s numerical identifier. j∈κ where κ is the output value and ω is the expected value. V. N EURAL N ETWORK Already in the forties of the twentieth century, the author of [25], [26], [27] described the first model of artificial neural network (ANN). Artificial neural network is a mathematical model inspired by the action of neurons in the human brain. ANN is composed of three types of layers – the input, hidden and output. The input layer is responsible for the acceptance of teaching vector, and output for return a result of the network. Hidden layers are located between the input and output, they are responsible for creating a deeper network in order to obtain better results. Each layer is constructed of neurons, wherein each neuron of one layer is connected to each neuron in the next layer. Neuron is the smallest object of neural network. The data enters the neuron through mergers which have a certain weight. In the neuron, all input values are rescaled by the activation function. The value after scaling is sent to other neurons along outbound connections to the next layer. As activation function selected a bipolar sigmoid function [28], [29], [30]– function is defined as 2 f (x) = , (5) 1 + exp(−βx) where β is a parameter in (0, 1]. Activation function graph is shown in Fig. 6. Fig. 5: Knowledge representation using Sammon mapping. VI. P ROPOSED MODEL OF AUTHOR ’ S IDENTIFICATION The problem of a person verification on the basis of a longer text requires not only the extraction of the character- istics of his speech, but also correct classification. For this purpose, the proposed model consists several stages. At the beginning, the statement is entered into a computer, where it statement is processed according to Sec. IV. The next step of action has two paths. In the first one, the sample is stored in the database. When the database contains a sufficient number of samples, the neural network is trained using the knowledge contained in the database. The second path of action is to classify the samples by the neural network. Fig. 4: Error learning neural network. A model of such a system is shown in Fig. 3. 75 VII. E XPERIMENTS [5] R. Johnson and T. Zhang, “Effective use of word order for text categorization with convolutional neural networks,” arXiv preprint The proposed solution has been tested by the use of 200 arXiv:1412.1058, 2014. samples – 100 samples per person. Each sample contained a [6] A. Fornaia, C. Napoli, G. Pappalardo, and E. Tramontana, “An aop- fragment of statements about the future, taking into account up rbpnn approach to infer user interests and mine contents on social to 60 words. For the purposes of minimizing the time of neural media,” Intelligenza Artificiale, vol. 9, no. 2, pp. 209–219, 2015. networks learning, each sample contained 15 components. A [7] C. Napoli, G. Pappalardo, E. Tramontana, R. K. Nowicki, J. T. Star- neural network was composed of 4 layers czewski, and M. Woźniak, “Toward work groups classification based on probabilistic neural network approach,” in Artificial Intelligence and Soft Computing. Springer, 2015, pp. 79–89. • input layer composed of 15 neurons; [8] X.-S. Yang, “Flower pollination algorithm for global optimization,” in • 4 hidden layers composed of 4 neurons; Unconventional computation and natural computation. Springer, 2012, pp. 240–249. • output layer consisting of one neuron. [9] M. Wozniak, “Fitness function for evolutionary computation applied in dynamic object simulation and positioning,” in Computational In- To train the network, the samples were divided into two telligence in Vehicles and Transportation Systems (CIVTS), 2014 IEEE groups (training and verifying – 80% : 20%). The problem of Symposium on. IEEE, 2014, pp. 108–114. classification has been shown in Fig. 5 using Sammon mapping [10] D. Połap, M. Woźniak, C. Napoli, E. Tramontana, and R. Damaševičius, – based on this interpretation of the spread of knowledge it “Is the colony of ants able to recognize graphic objects?” in Information can be seen that there can not be any easy way to separate the and Software Technologies. Springer, 2015, pp. 376–387. samples into two groups, so the problem of classification is [11] D. Polap, M. Wozniak, C. Napoli, and E. Tramontana, “Is swarm extremely difficult. The network was trained to obtain the error intelligence able to create mazes?” International Journal of Electronics and Telecommunications, vol. 61, no. 4, pp. 305–310, 2015. of the 0.24 - error learning graph is shown in Fig. 4. In order to [12] D. Połap, M. Wozniak, C. Napoli, and E. Tramontana, “Real-time verify the operation of the classifier, each sample was given to cloud-based game management system via cuckoo search algorithm,” the input of the network. In consequence of the operation, the International Journal of Electronics and Telecommunications, vol. 61, system indicate the author correctly for 103 samples, which no. 4, pp. 333–338, 2015. results indicates in a efficiency at about 72%. [13] D. Połap, “Designing mazes for 2d games by artificial ant colony algorithm,” Symposium for Young Scientists in Technology, Engineering and Mathematics (SYSTEM 2015), pp. 63–70, 2016. VIII. C ONCLUSION [14] M. Woźniak, W. M. Kempa, M. Gabryel, R. K. Nowicki, and Z. Shao, The subject of the research was to prove the possibili- “On applying evolutionary computation methods to optimization of ties of processing the natural language by the methods of vacation cycle costs in finite-buffer queue,” in Artificial Intelligence and Soft Computing. Springer, 2014, pp. 480–491. computational linguistics. We have shown that, given two [15] M. Woźniak, W. M. Kempa, M. Gabryel, and R. K. Nowicki, “A finite- different longer texts, we are able to identify their authors buffer queue with a single vacation policy: An analytical study with exclusively on the basis of the words used - with a minimal evolutionary positioning,” International Journal of Applied Mathematics risk rate possible (error - 0.24). After entering the data into and Computer Science, vol. 24, no. 4, pp. 887–900, 2014. the computer, the use of the database’s contents needed to [16] P. V. Laxmi and K. Jyothsna, “Optimization of service rate in a discrete- be classified by the neural network. Having done so, we time impatient customer queue using particle swarm optimization,” in processed two independent author’s speeches Barack Obama’s Distributed Computing and Internet Technology. Springer, 2016, pp. 38–42. and Mitt Romney’s addresses and therefore, helped by the neural networks, we could evaluate whose a text is just on the [17] X. Gao and N. Zhu, “Natural language processing,” Information Tech- nology Journal, vol. 12, no. 17, pp. 4256–4261, 2013. basis of the samples given and after comparing the data we had with the one contained in the database. The outcomes then [18] K. Xu, S. Zhang, Y. Feng, and D. Zhao, “Answering natural language questions via phrasal semantic parsing,” in Natural Language Process- of such an experiment are of significant help when it comes to ing and Chinese Computing. Springer, 2014, pp. 333–344. the examination of idiolects and distinctive, personal styles of [19] J. Berant and P. Liang, “Semantic parsing via paraphrasing.” in ACL constructing a discourse. It – followingly – contributes to the (1), 2014, pp. 1415–1425. development of the Artificial Intelligence as it is the computer [20] J. Andreas, A. Vlachos, and S. Clark, “Semantic parsing as machine to identify the authorship of a text. translation.” in ACL (2), 2013, pp. 47–52. [21] G. Pilato, A. Augello, and S. Gaglio, “A modular system oriented to R EFERENCES the design of versatile knowledge bases for chatbots,” ISRN Artificial Intelligence, vol. 2012, 2012. [1] C. Napoli and E. Tramontana, “An object-oriented neural network [22] J. Hill, W. R. Ford, and I. G. Farreras, “Real conversations with artificial toolbox based on design patterns,” in Information and Software Tech- intelligence: A comparison between human–human online conversations nologies. Springer, 2015, pp. 388–399. and human–chatbot conversations,” Computers in Human Behavior, [2] C.-T. Chen, K.-S. Chen, and J.-S. Lee, “The use of fully polarimetric vol. 49, pp. 245–250, 2015. information for the fuzzy neural classification of sar images,” Geo- [23] K.-R. Koch, Bayes Theorem. Springer, 1990. science and Remote Sensing, IEEE Transactions on, vol. 41, no. 9, pp. 2089–2100, 2003. [24] D. L. Faigman and A. Baglioni Jr, “Bayes’ theorem in the trial process: [3] A. Kandaswamy, C. S. Kumar, R. P. Ramanathan, S. Jayaraman, and Instructing jurors on the value of statistical evidence.” Law and Human N. Malmurugan, “Neural classification of lung sounds using wavelet Behavior, vol. 12, no. 1, p. 1, 1988. coefficients,” Computers in Biology and Medicine, vol. 34, no. 6, pp. [25] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward 523–537, 2004. networks are universal approximators,” Neural networks, vol. 2, no. 5, [4] D. Valentin, H. Abdi, and A. J. OTOOLE, “Categorization and iden- pp. 359–366, 1989. tification of human face images by neural networks: A review of the [26] B. Kosko, “Neural networks and fuzzy systems: a dynamical systems linear autoassociative and principal component approaches,” Journal of approach to machine intelligence/book and disk,” Vol. 1Prentice hall, biological systems, vol. 2, no. 03, pp. 413–429, 1994. 1992. 76 Fig. 6: The model of the proposed artificial neural network with 4 hidden layers. 77 [27] D. F. Specht, “Probabilistic neural networks,” Neural networks, vol. 3, no. 1, pp. 109–118, 1990. [28] P. d. B. Harrington, “Sigmoid transfer functions in backpropagation neural networks,” Analytical Chemistry, vol. 65, no. 15, pp. 2167–2168, 1993. [29] H. Yonaba, F. Anctil, and V. Fortin, “Comparing sigmoid transfer functions for neural network multistep ahead streamflow forecasting,” Journal of Hydrologic Engineering, vol. 15, no. 4, pp. 275–283, 2010. [30] M. Panicker and C. Babu, “Efficient fpga implementation of sigmoid and bipolar sigmoid activation functions for multilayer perceptrons,” IOSR Journal of Engineering (IOSRJEN), pp. 1352–1356, 2012. [31] R. Hecht-Nielsen, “Theory of the backpropagation neural network,” in Neural Networks, 1989. IJCNN., International Joint Conference on. IEEE, 1989, pp. 593–605. [32] M. Riedmiller and H. Braun, “A direct adaptive method for faster backpropagation learning: The rprop algorithm,” in Neural Networks, 1993., IEEE International Conference on. IEEE, 1993, pp. 586–591. [33] P. J. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990. 78