Neural Validation of Grammatical Correctness of Sentences Dawid Połap Institute of Mathematics Silesian University of Technology Kaszubska 23, 44-100 Gliwice, Poland Email: Dawid.Polap@gmail.com Abstract—In this paper, the idea of validating the correctness and closed teacher-learner systems [8]. In [9] the possibility of sentences has been presented. In the proposed solution, of using hardware design patterns in System-on-Chip designs traditional and modern theories of syntax grammar were used. In was discussed. The frequent developing large applications addition, preprocessing of sentences and neural approach to the problem has been shown. For different sentences the proposed require continuous supervision of changes and relationships. method has been tested. Research results were presented and An important issue is to improve the changes in the existing discussed in terms of the advantages and disadvantages of the code, for example, by finding and saving design patterns that presented method. have already been made [10]. Heuristic algorithms are widely used in other practical prob- I. I NTRODUCTION lems of modern computing as traffic optimization in digital Artificial intelligence (AI) is becoming a major direction of databases where particle swarm algorithm was used [11]. In today’s science. Scientists of various fields biology, mathemat- [12], the authors presents an application of neural networks ics, physics and computer science try to improve and enrich to predict air pollution concentration for various chemical the daily lives with new technologies that are the cornerstone factors. The idea presents a model of virtual monitoring station to achieve true AI. Numerous application of methods known as points. Another increasing problem is the growing amount AI allow us to choose more and more bolder science majors. of data, and with it - the problem of queuing and database One of the main applications of these methods are image searches. Queuing is a problem that was born in the 50s of the processing and pattern recognition. For example, in [1], [2], last century with the development of the telecommunication the authors proposed an alternative method of compressing networks. In the era of the Internet, the problem of queuing digital images; in [3] the idea of using methods inspired by has grown - servers, databases, service providers, shops and natural and physical phenomena to the problem of finding key websites are just a few examples where better and better solu- points was explained. This problem has numerous applications tions are needed today. In [13] the idea of the use of heuristic in modern medicine, where key points on X-RAY images methods to positioning of queuing models was shown, and in may represent a variety of diseases. The creation of decision [14], [15] the applicability of sorting algorithms for databases support systems based on key points can help in diagnosis with a very large number of records. While [16] presents an of difficulties to detect diseases and even prevented them algorithm to create mazes by artificial ant colony algorithm by fast detection. In numerous scientific papers ([4], [5]) a that may be used in two-dimensional games. Again in [17] possibility of using methods of computational intelligence to was shown that use of computational intelligence as a manager acquire such knowledge was shown, which in the next stage of in real-time cloud-based game management system where the research was used in decision support systems. For example, algorithm modifies the game depending on the randomness in [5] the use of knowledge of employees in the company to and movement of the player can increase efficiency. As the create meaningful groups of human resources in the company final result, the method of creating games that scenarios are was discussed. Moreover, in the case of pattern recognition variables in real time was created so the game is different for common problem is the number of samples (knowledge) each approach. required for the proper operation of the algorithm. Quite often, Natural language processing is another well developed area the number of these samples is just too small. In order to of computer science. In [18], the authors proposed the use of increase their numbers, many algorithms are developed for natural language processing in an accurate and efficient identi- this purpose. In [6] was shown that the theory of fuzzy sets is fication of problems related to opioid. Again in [19] presented a good option. Another solution is the use of computational a probabilistic framework that allows robots to perform your intelligence [7]. Of course, the method of knowledge transfer commands without knowing the environment. The authors for the various systems is also created, for instance for open tested and presented the results of their research on two mobile robots. Analysis of the authorship identification methods in the Copyright c 2016 held by the authors. national language electronic discourse is presented in [20]. In the paper, an analysis of methods for English and discussed 25 generalization of these methods for different languages were In proposed sentence the father is the subject, is refers to the presented. action of the subjects and the king is the subject’s predicate. Moreover, there is more and more need for chat-bots, or question-answer systems or even systems that enable a B. Modern grammar conversation with a machine. In [21] was shown an innovative Modern grammar is strongly associated with the work of approach to natural language processing by the neural network the German mathematician Gottlob Frege. Frege was inter- composed of several parts - Sentence Layer, Layer Knowledge, ested in the theory of predicate in the field of mathematical Deep Layer Case and Network Dictionary. This approach logic, which has become the cornerstone of new theories in allowed the authors to obtain first of all associative memory linguistics - Frege argued that there is a distinction between and a question-answering system. Natural language processing sense and reference in relation to reference. is quite complicated topic as evidenced by the numerous works In this context, predicates of a sentence are treated as a of different approaches to the subject. [22] presented the idea mathematical functions. Predicates are used as a function that of using image-text modeling, where the system was taught can assign multiple arguments to each other, so each sentence words using their representation. Moreover, the authors claim (in the linguistic sense) can be described mainly by predicates that their algorithm offers easy transfer to the sound. Many and their arguments. In [27], the author used this mathematical scientists from the whole world try to modify existing methods notation for linguistic sentences - applying this notation for to get the best results eg.: convolution neural networks with sentence in (1), we get little hyper-parameter tuning with numerous modifications [23] and in [24] shown the classic methods in this type of issue. An interesting algorithm for learning neural probabilistic language is(His father, a king) ⇐⇒ is = f (His father, a king). (2) models was presented in [25]. The authors mentioned all the disadvantages of learning using gradient methods and According to this model, the verb is the predicate and the large databases of words. They proposed algorithm based on noun phrase is an argument. In each sentence, all predicates noise-contrastive estimation. After testing the algorithm, they create a matrix predicates, which comprises verbs, adjectives concluded that it is not only fast and effective but above all etc. Each sentence can be represented as a tree, where the stable algorithm. words are vertices, as shown in Fig. 1, 2 and 3. In this paper, two different approaches to automatic recog- nition of grammatically correct sentences are shown. II. T HEORIES OF SYNTAX AND GRAMMAR A. Traditional grammar There are two different ways of notation in the grammar. The first one interprets the sentence as a combination of two elements - subject and predicate. This is one of the most common notation in English. To this day, it is called traditional grammar which comes from the Latin and Greek grammars. In this interpretation, we can identify some basic rules as • The sentence is composed of subject and predicate. • The predicate is regarded as a property or characteristic of the subject. Fig. 1: Decomposition of the following sentence The comet • The predicate must contain a verb. may have passed the orbit of the Earth, where A is an • Verb in the predicate requires a specific object, predica- argument of predicate, B means the matrix predicate and C tive and adjuncts. represents the second argument of predicate (according to • The relation between the subject and the verb is called a modern grammar). nexus. In [26], Otto Jespersen presented to the world the relation nexus in context of the analysis written text. Nexus has been III. P REPROCESSING SENTENCES described as a relation between the subject and his action in the To train the neural networks, an important step is proper sentence. Introduced relation was intended to create a system preparation of samples. Neural network on the input accepts of symbols for parts of speech. vector of numerical values, and the sentence in the sense of Let me take the following sentence to illustrate the nexus linguistics is not filed in numbers. To do this, we need to relation re-evaluate the words into numerical values. Moreover, the number of elements in samples is also important. The more His father is a king. (1) the value, the longer the time of learning. 26 by the following formula p(p − pw) ∗ p(pw, pa), (3) where w is the word, and pw means the part of speech and pa is a part of speech which may be the next word. For all paths, algorithm selects the most optimal solution. In each iteration, a new word is added and the next optimal solution is searched. At the end of the algorithm, the most optimal path is returned, which represents the most likely sequence of parts of speech for the analyzed sentence. Fig. 2: Decomposition of the following sentence Whatever In the case of traditional grammar, sentence must be dis- she made was always beautiful, where A is an argument of tributed on the subject and predicate. Using the VOLSUNGA predicate and B means the matrix predicate (according to algorithm, we find a place of occurrence of the subject a modern grammar). and verb b. In next step, values a and b are re-calculated in accordance with  v u1 a 2  u X  val = i  1  t a i=0    v , (4) u X n   1 i2 u val = t   2  b   i=b where n is the number of all words in a sentence, and val1 , val2 mean the average amount of information contained in the subject and predicate. Using these values and the value c which takes the value 0 if sentence is incorrect or 1 in other case, a six-element vector can be created as Fig. 3: Decomposition of the following sentence It is raining today, where A is an argument of predicate, B means the [val1 , val2 , a, b, n, c] . (5) matrix predicate and C represents the second argument of predicate (according to modern grammar). Algorithm 1 VOLSUNGA algorithm 1: Start 2: Set probability matrix In order to reevaluate words in the numerical values used 3: for each pair of words do both grammatical notations - traditional (II-A) and modern 4: Calculate the likelihood of analyzed pair using (3) (II-B). 5: Select the most optimal solution 6: end for A. Traditional notation 7: Return the best solution 8: Stop Algorithms type of part-of-speech tagging are related to finding specific parts of speech in a sentence. In the 60s of last century, the Brown Corpus (Brown University Standard B. Modern grammar Corpus of Present-Day American English) interested in this problem, created a huge database of words. It was not until For the grammar described in II-B approach, creating a 30 years later, in [28] used the hidden Markov models in this sample is much easier than for a traditional approach - the issue. The simplest method is to find probability of occurrence problem is to find the matrix predicate. In the matrix predicate of a particular part of speech after the word. For example, if will be a verb and specify verb for each tense eg.: for the the current expression is the, the next most likely is a noun present perfect, it will be have/has. To find the matrix, we can or an adjective. In subsequent years, in order to obtain higher find each verb by VOLSUNGA algorithm or find specific verb accuracy, the number of analyzed words was increased - to in the analyzed sentence and select the next verb. three or four. Another known method of finding specific parts After finding the matrix, vector representing the sentence of speech is algorithm called VOLSUNGA presented in [29]. can be created as VOLSUNGA algorithm operates on the principle of finding [val3 , val4 , val5 , val6 , k, c] , (6) the optimal path based on the matrix of probabilities. For each pair of words, the value p is calculated as the likelihood of where k means the number of arguments, and the remaining the analyzed pair of parts of speech, what can be described values are calculated by 27  v k u cj  X u1 X i2  val3 =   t c    j=0 i=0   v cm   u X  1 i2  u  val4 = t b i=0 , (7)    k   1X val = cj  5  k j=0     val = cm    6 k where cj is the number of words in j-th argument, cm is the number of words in the matrix. Values val3 , val4 represent average amount of information in arguments/matrix and val5 , val6 are arithmetic amount of words respectively in the argument and the matrix. C. Example Consider the following sentence Whatever she made was always beautiful. (8) In the case of traditional notation, we obtain the following values r 1 val3 = ∗ 5 ≈ 1, 6 2 r 1 val4 = ∗ 86 ≈ 5, 4, 3 which comprise the following vector [1, 6; 5, 4; 2; 3; 6; 1] . In the case of modern notation, we obtain r 1 val3 = ∗ 5 ≈ 0, 9 6 r Fig. 4: The model of the proposed system to validate gram- 1 matical sentences. val4 = ∗ 30 ≈ 3, 2 3 1 val5 = 2 ∗ = 0, 33 Each neuron is composed of the value (signal y) of each 6 neuron from the previous layer and weights w imposed on 1 val6 = 4 ∗ ≈ 0, 17 the connection between each pair of neurons. In the neuron, 6 each signal is multiplied by the weight and all the values are and finally summed. The value of the sum of products is understood as a [0, 9; 3, 2; 0, 33; 0, 17; 6; 1] neuron argument. The output value of the neuron is the value IV. N EURAL VALIDATION of the activation function. The sigmoid function is one of the most popular which value is in the range h0, 1i and the formula In order to validate the correctness of sentences, a neural is network was used ([30]–[32]) with back propagation algo- 1 f (x) = , (9) rithm. Neural networks are mathematical models inspired by 1 + e−βx the operation of the neural network in the human brain. Model where β is a parameter and x is the neuron argument calculated of the entire network can be divided into three types of layers: by an input, hidden and output. Each layer consists many neurons Xn where each of them is connected to all the neurons in adjacent x= wi yi , (10) layers. The connection between two neurons is marked by i=0 weight. Weights are selected in a random way at the beginning where n is the number of neurons in the previous layer. In the of the algorithm. input layer, learning vectors (samples) are entering signals. 28 In the hidden layer, neurons recalculate values obtained from previous layers and the output layer returns the result of the network. For the purposes of validation of sentences, proposed net- work is constructed of 6 neurons in the input layer, 3 neurons in each of the 3 hidden layers and 1 neuron in the output layer. A. The Backpropagation Algorithm The Backpropagation Algorithm ([33], [34]) is used to modify weights in the neural network to get the most accurate learning. The algorithm assumes the calculation of error of each neuron starting at the output layer until the first hidden layer. Using the calculated error, the weight of the connection between the neurons is respectively modified by w = w + ∆w, (11) Fig. 5: Sample neural network learning process error. where ∆w is an error calculated by TABLE I: The effectiveness of the network for the prepared samples.  ∆w = f (x)(1 − f (x)(p −X f (x)) for the output layer ∆w = f (x)(1 − f (x)) wi δi for the hidden layer Grammar Effectiveness signals Traditional 67% (12) Modern 89% where p is expected output. Algorithm 2 The Backpropagation Algorithm VI. C ONCLUSIONS 1: Start 2: Generate random weight The proposed system allows us to check whether a sentence 3: while global error>0.1 do is grammatically correct. This is an important aspect for the 4: for each layer in network do development of a communication system between man and 5: for each neuron in layer do machine. The proposed solution includes two grammatical 6: Calculate the neuron activation in accordance with notations. The results show that the use of modern theories (9) and (10) of language based on the logic of predicates can be crucial in 7: end for further developing the field of natural language processing. 8: end for In future research work, we plan to further develop the 9: for each neuron in output layer do topic of natural language (including analysis of words and 10: Calculate error using (12) sentences). 11: end for 12: for each hidden layer do ACKNOWLEDGMENT 13: for each neuron in hidden layer do Author acknowledge contribution to this project of Oper- 14: Calculate error using (12) ational Programme: “Knowledge, Education, Development” 15: Modify the weight using (11) financed by the European Social Fund under grant application 16: end for POWR.03.03.00-00-P001/15. 17: end for 18: end while 19: Stop R EFERENCES [1] F. Hussain and J. Jeong, “Efficient deep neural network for digital image compression employing rectified linear neurons,” Journal of Sensors, V. E XPERIMENTS vol. 2016, 2015. [2] M. Wozniak, C. Napoli, E. Tramontana, and G. Capizzi, “A multiscale For the purpose of checking the correctness of the proposed image compressor with rbfnn and discrete wavelet decomposition,” validation, a database of 200 sentences was created. Every in Proceedings of International Joint Conference on Neural Networks sentence was processed in two vectors one for each grammar. (IJCNN). IEEE, 2015, pp. 1–7. [3] M. Woźniak and Z. Marszałek, “An idea to apply firefly algorithm in 2d The neural network was trained for β = 0, 6. In the next image key-points search,” in Proceedings of International Conference on step, each vector was processed by a trained neural network Information and Software Technologies. Springer, 2014, pp. 312–323. to check the quality of the validation. The obtained results are [4] C. Napoli and E. Tramontana, “An object-oriented neural network toolbox based on design patterns,” in Proceedings of International Con- shown in Tab. I. The graph showing the error reduction over ference on Information and Software Technologies (ICIST). Springer, the number of epochs is in Fig. 5 [18]. 2015, pp. 388–399. 29 [5] C. Napoli, G. Pappalardo, E. Tramontana, R. K. Nowicki, J. T. Star- [26] O. Jespersen, Analytic syntax. University of Chicago Press, 1937. czewski, and M. Woźniak, “Toward work groups classification based on [27] D. J. Allerton, Essentials of grammatical theory: A consensus view of probabilistic neural network approach,” in Proceedings of International syntax and morphology. Routledge, 1979. Conference on Artificial Intelligence and Soft Computing (ICAISC). [28] J. Kupiec, “Robust part-of-speech tagging using a hidden markov Springer, 2015, pp. 79–89. model,” Computer Speech & Language, vol. 6, no. 3, pp. 225–242, 1992. [6] B. A. Nowak, R. K. Nowicki, M. Woźniak, and C. Napoli, “Multi- [29] S. J. DeRose, “Grammatical category disambiguation by statistical class nearest neighbour classifier for incomplete data handling,” in optimization,” Comput. Linguist., vol. 14, no. 1, pp. 31–39, Jan. 1988. Proceedings of International Conference on Artificial Intelligence and [Online]. Available: http://dl.acm.org/citation.cfm?id=49084.49087 Soft Computing (ICAISC). Springer, 2015, pp. 469–480. [30] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent [7] M. Wozniak, D. Polap, G. Borowik, and C. Napoli, “A first attempt in nervous activity,” The bulletin of mathematical biophysics, vol. 5, to cloud-based user verification in distributed system,” in Proceedings no. 4, pp. 115–133, 1943. of Asia-Pacific Conference on Computer Aided System Engineering [31] J. E. Dayhoff, Neural network architectures: an introduction. Van (APCASE). IEEE, 2015, pp. 226–231. Nostrand Reinhold Co., 1990. [8] R. Damaševičius, “Towards empirical modelling of knowledge transfer [32] D. Psaltis, A. Sideris, and A. Yamamura, “A multilayered neural network in teaching/learning process,” in Proceedings of Information and Soft- controller,” IEEE control systems magazine, vol. 8, no. 2, pp. 17–21, ware Technologies. Springer, 2014, pp. 359–372. 1988. [9] R. Damaševičius, G. Majauskas, and V. Štuikys, “Application of design [33] R. Hecht-Nielsen, “Theory of the backpropagation neural network,” patterns for hardware design,” in Proceedings of the 40th annual Design in Proceedings of International Joint Conference on Neural Networks Automation Conference. ACM, 2003, pp. 48–53. (IJCNN). IEEE, 1989, pp. 593–605. [10] S. Cicciarella, C. Napoli, and E. Tramontana, “Searching design patterns [34] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning represen- fast by using tree traversals,” International Journal of Electronics and tations by back-propagating errors,” Cognitive modeling, vol. 5, no. 3, Telecommunications, vol. 61, no. 4, pp. 321–326, 2015. p. 1, 1988. [11] M. Wozniak, “On positioning traffic in nosql database systems by the use of particle swarm algorithm,” in Proceedings of XV Workshop DAGLI OGGETTI AGLI AGENTI–WOA, 2014, pp. 25–26. [12] G. Capizzi, G. L. Sciuto, P. Monforte, and C. Napoli, “Cascade feed forward neural network-based model for air pollutants evaluation of single monitoring stations in urban areas,” International Journal of Electronics and Telecommunications, vol. 61, no. 4, pp. 327–332, 2015. [13] M. Woźniak, W. M. Kempa, M. Gabryel, R. K. Nowicki, and Z. Shao, “On applying evolutionary computation methods to optimization of va- cation cycle costs in finite-buffer queue,” in Proceedings of International Conference on Artificial Intelligence and Soft Computing (ICAISC). Springer, 2014, pp. 480–491. [14] M. Woźniak, Z. Marszałek, M. Gabryel, and R. K. Nowicki, “Modified merge sort algorithm for large scale data sets,” in Proceedings of International Conference on Artificial Intelligence and Soft Computing (ICAISC). Springer, 2013, pp. 612–622. [15] Z. Marszalek, M. Wozniak, G. Borowik, R. Wazirali, C. Napoli, G. Pap- palardo, and E. Tramontana, “Benchmark tests on improved merge for big data processing,” in Proceedings of Asia-Pacific Conference on Computer Aided System Engineering (APCASE). IEEE, 2015, pp. 96– 101. [16] D. Połap, “Designing mazes for 2d games by artificial ant colony algo- rithm,” in Proceedings of Symposium for Young Scientists in Technology, Engineering and Mathematics, 2016, pp. 63–70. [17] D. Połap, M. Wozniak, C. Napoli, and E. Tramontana, “Real-time cloud-based game management system via cuckoo search algorithm,” International Journal of Electronics and Telecommunications, vol. 61, no. 4, pp. 333–338, 2015. [18] D. S. Carrell, D. Cronkite, R. E. Palmer, K. Saunders, D. E. Gross, E. T. Masters, T. R. Hylan, and M. Von Korff, “Using natural language pro- cessing to identify problem usage of prescription opioids,” International journal of medical informatics, vol. 84, no. 12, pp. 1057–1064, 2015. [19] F. Duvallet, M. R. Walter, T. Howard, S. Hemachandra, J. Oh, S. Teller, N. Roy, and A. Stentz, “Inferring maps and behaviors from natural lan- guage instructions,” in Proceedings of Experimental Robotics. Springer, 2016, pp. 373–388. [20] A. Venčkauskas, R. Damaševičius, R. Marcinkevičius, and A. Karpavičius, “Problems of authorship identification of the national language electronic discourse,” in Proceedings of Information and Software Technologies. Springer, 2015, pp. 415–432. [21] T. Sagara and M. Hagiwara, “Natural language neural network and its application to question-answering system,” Neurocomputing, vol. 142, pp. 201–208, 2014. [22] R. Kiros, R. Salakhutdinov, and R. Zemel, “Multimodal neural language models,” in Proceedings of International Conference on Machine Learn- ing (ICML), 2014, pp. 595–603. [23] Y. Kim, “Convolutional neural networks for sentence classification,” arXiv preprint arXiv:1408.5882, 2014. [24] Y. Chen, “Convolutional neural network for sentence classification,” 2015, university of Waterloo. [25] A. Mnih and Y. W. Teh, “A fast and simple algorithm for training neural probabilistic language models,” arXiv preprint arXiv:1206.6426, 2012. 30