About New Version of RSDS System Zbigniew Suraj and Piotr Grochowalski Institute of Computer Science University of Rzeszów, Rzeszów, Poland {zbigniew.suraj,piotrg}@ur.edu.pl Abstract. The aim of this paper is to present a new version of a bib- liographic database system - Rough Set Database System (RSDS). The RSDS system, among others, includes bibliographic descriptions of pub- lications on rough set theory and its applications. This system is also an experimental environment for research related to the processing of bibli- ographic data using the domain knowledge and the related information retrieval. Keywords: rough sets, data mining, knowledge discovery, pattern recog- nition, database systems. 1 Introduction The presented RSDS system is a bibliographic system that includes bibliograph- ical references aimed at disseminating information on publications on rough set theory and its applications [4–8]. The system is available for free at http://rsds. univ.rzeszow.pl. Currently about 4000 bibliographical descriptions of publica- tions are collected in its database. The RSDS system is also an experimental environment for carrying out research related to the broadly defined information processing based on the methods and techniques in the field of ontology and rough sets. In addition, it enables one to analyze the data contained in the database by using the advanced statistical and graphical methods and techniques. Apart from the bibliographical descriptions, it contains additionally: – information on software related to the theory and applications of rough sets, – bibliographies of people who render outstanding services to the development of rough set theory and its applications, – personal details about the authors of publications whose descriptions are included in the database of this system. The system was developed in the client-server technology, i.e., the data for the system and mechanisms for handling such data are running on the server and the user with the help of the web browser is able to access the resources. The work on the system began in 2002 and it is continued. During the works there were developed two versions of the system that have been made available for users [9]-[11],[14]-[15],[19]. In March 2013, the third version of the system About New Version of RSDS System 399 Fig. 1. The main window of the system. was released. It has been completely rebuilt, allowing the introduction of larger number of facilities for users of the system with the use of modern technology. In the current version the following changes was introduced: – exchange and reorganization of the engine of the system (CMS - Drupal), – rebuilding the website design, – rebuilding and upgrading the functionalities of the system, – increasing the role of the administrator(s) of the system - admin panel, – larger number of facilities for registered users - user panel, – introduction of the status of data and the possibility of its modification, – system-to-user communication via e-mail (on important issues). The rest of the paper is organized as follows. Section 2 describes the logical structure of the RSDS system. Functionalities of the system are presented in Section 3. Section 4 provides description of data for the system. Section 5 gives the system requirements. The future plans for the RSDS system are discussed in section 6. 2 The Logical Structure of the System The RSDS system structure can be divided into four functional layers. Each of these layers includes modules and with the help of these modules the layers meet specified tasks (see Figure 2): – The presentation layer with the graphical interface module. – The application layer with the modules of login, add/edit, search, graph and statistical, download, auxiliary (biographies of people, software, maps). – The communication layer with the module of communication with the database. 400 Z. Suraj, P. Grochowalski Fig. 2. The logical structure of system. – The physical layer containing the database. The purpose of the modules from the presentation layer is to communicate with the user with the usage of created interface. In the application layer there are the modules implementing the main func- tionalities of the system. The login module is responsible for the correct handling of the login/logoff process of the users and storing information about the users logged in the system. The add and edit module supports the process of im- plementing the new data into the system or editing already existing ones. It will ensure the correctness of the input data and its correct assignment to the user-owner. The search module performs the search process of the publications descriptions meeting the criteria set by the user. The operation of this module has been improved by implementing the results of the research connected with the information retrieval in it. The graphic and statistical module is designed not only for the analysis of data in the system but also for the analysis of the activity of the users. The analysis is carried out in various aspects, such as publications, the authors of the publication and the relationships between them. The purpose of this module is the presentation of the results of these analyzes. The download module provides the users with different options of retrieving data from the sys- tem in the form of prepared bibliography. Auxiliary modules extend the basic functionality of the system to the biographies of people who render outstand- ing services to the development of rough set theory and its applications, and descriptions of available software related to the theory of rough sets and also a world map showing where the given problem is growing solved. The communication layer possesses a module that is responsible for the proper communication with the database, which stores the data for the system. The physical layer includes a relational database in which the data are stored and presented in the system. About New Version of RSDS System 401 3 System Functionalities The basic functionalities of the RSDS system include: – Adding new data. – Editing existing data. – Data search. – Registration of users in the system. – Saving data to a file. – Sending data files to the administrator. – Service of user comments. – Statistics, analysis of statistical graphs, determining the Pawlak number of the first and second kind, classifier of publications. – Help. Capabilities and the content of the RSDS system are constantly extended. In order to store the held information in the simplest form and to exclude redundancy (redundancy of data), the data for the RSDS system are stored in a relational database. The database structure is based on the BibTeX format [22]. Well defined and uniform structures of the description decided about its choice. Additionally, the possibility of getting the bibliographical descriptions in the BibTeX format included in the system, allowing one to automatically generate bibliographies and attach them to the prepared publication, has been added to the system. To share data, the system should be first equipped with it. Data entry and other operations allowing a modification of data, require user authentication by logging on to the system. New users, in order to get access to the full function- ality of the system, need to register. Data entry can be performed by two independent pathways: – by predefined forms; – with the usage of software able to read the files in the BibTeX format and storing information in the system in the appropriate way. The usage of predefined forms allows registered users to introduce new data into the system individually. If one does not want to do this action individually or intends to enter a large number of new data, he or she can send the data to the system administrator and then administrator with the use of appropriate software will enter the received data into the system. The advantage of the in- dividual data input by users is that they are assigned to them. In such case the users are authorized to edit the data in the future. This possibility is available only for registered users, in order to avoid entering incorrect information into the system. System with published descriptions (data) provides various options of searching. Searching for information on the RSDS system was implemented so far in two main ways: 402 Z. Suraj, P. Grochowalski – Alphabetical search by certain keywords, such as titles of publications, their authors, editors, conference names, magazines, or year; – Advanced search based on specified criteria, which sought description of publication has to fulfill. Each of the currently available options of finding information in the RSDS system has both positive and negative aspects. Alphabetical search works when you know for example: the author of sought publication, the name of the journal in which the publication was published, who published a publication, or when one knows the year of the publication. The weaker part of this search method, however, is that in the absence of precise information about sought publication the system provides the large number of publication descriptions meeting the search criteria, which often have to be further analyzed by a painstaking selection process. However, during advanced search, the user defines the criteria which have to be fulfilled for the sought publication and, depending on the accuracy of the selection of these criteria, he or she obtains more or less adequate results. The problem of the further selection of obtained results still exists in many cases. This process involves directing a user’s query to the database system to find the matching (appropriate) data (publication) for your search pattern. This matching is based on the finding of exact pattern in the data in the database. If the matching to the pattern data is found, it will be annexed to the result. This is repeated for all data located in the database. Then the edited result set (publication) is sent to the user (see Figure 3). Fig. 3. The current course of the process of searching for the information in the RSDS system. Finding information in the system process supplemented with the additional knowledge (using ontology and methods of rough set theory) is presented in the general form shown in Figure 4. This process in comparison with the mapping About New Version of RSDS System 403 specified search pattern is different in this that the system after receiving the user’s query accede to its confrontation with the information included in the system [1–3, 17, 21]. This is done as follows: from the resources of the system are retrieved the specific ontologies for publication and they are verified for their belonging to the inner circle of the sought information in the domain of general ontology (taking into account the relationships between different types of concept). In the case of positive verification the approximations are appointed for a specific publication. They are then aggregated into a single value representing the total adjustment to the published information. When appointed value is greater than the threshold value, the given publication will be included in the result set. This process is repeated for all the data (publications). Next, after the result set processing (division of publication into groups) it is sent to the user. The developed methodology has been implemented in the system and made available to the users in the ontological search section. In order to streamline the process of building queries, for the system has been prepared the editor to assist the creation of query-by-user. The functionalities of this editor are: auto completion of entered concepts, defined relational operators such as AND, OR, NOT, and taking into account the priority of the process by using parentheses. Fig. 4. The process of searching information in the RSDS system based on additional knowledge. In order to minimize the time complexity of the process, the determining of the detailed ontology is carried out once. Determination of the detailed ontology is a part of the process relying on the preparation for the implementation of the system of bibliographic information for retrieval based on domain knowledge. This process can be carried out in several steps, which also has been shown (see Figure 5). The first step is to develop the general ontology by the domain 404 Z. Suraj, P. Grochowalski expert. Then, on the basis of the system resources, supported by the domain knowledge (general ontology) the detailed ontologies are determined. In the pro- cess of the detailed ontology generation new concepts or relationships between concepts can occur. In that case the system in cooperation with the domain expert is able to include such concepts in the general ontology, thus creating extended domain ontology. The final step in this process is to determine the degrees of bibliographical descriptions match to the elementary concepts from the general ontology. This is done because of the minimization of time required for determining the response to a query from user. Fig. 5. The process of preparing the bibliographic system for searching information based on additional knowledge. The system, in addition to bibliographical data (represented in the form of descriptive or BibTeX format) provides a fairly wide range of different statistics and the results of the analyzes of the data [12, 13, 16, 18]. In the carried out research the data collected from the year 1981 to the present are taken into account. They are analyzed in two ways: statistical and graphic. In relation to statistic data are processed in different compartments of time: 1. till the defined periods, five-years in the incremental relation, i.e., 1981-1985, 1981-1990, 1981-1995, etc. 2. in certain five-years, i.e., 1981-1985, 1986-1990, 1991-1995, etc. In terms of graphic analysis it is made on the basis of the defined cooperation CG graph. The graph vertices are the authors of the publications included in the bibliographic system. Two vertices are connected by the edge when two authors have written at least one common publication. In statistical analysis were determined the following values characterizing the examined data set, i.e., the number of authors in respect to the number of About New Version of RSDS System 405 written publication, various kinds of means, standard deviations associated with a particular medium, the number of works in respect to the number of authors creating them. Graph analysis of the defined CG graph lies in its overall analysis that is the appointment of the average degree of vertex, of isolated vertices, etc. CG graph after the rejection of isolated vertices has been further analyzed. It was based on the determination of the components of the graph and the analysis of the largest component. Determination of components allowed on determining the groups of authors writing joint publications - the cooperating authors. These groups are reflected in reality. The analyzed parameters of the largest component allows for its accurate interpretation. Additionally, the system allows users to read the Pawlak number of the first and second kind, which values indicate the strength of the proximity of pub- lishing author’s work with prof. Z. Pawlak, i.e., the less value of the Pawlak number represents the stronger relation between the author of published work and Professor Pawlak [18, 20]. All analyzes are performed dynamically, i.e., the calculation of parameters is taking into account any change in the data collected in system. 4 Input-output Data Bibliographical descriptions are described in the system according to specifi- cations of BibTeX [22]. This means that the description of each publication is divided into elements defined by BibTeX, such as title, publisher, year of pub- lication, the keywords, abstract, etc. The prepared descriptions are placed in a relational database. Each component is stored in the database structure de- fined as a string, the importance of which, unfortunately, neither database nor database languages can understand. An example of a bibliographic description located in the system is presented in Table 1. Descriptions of publications are formulated in English. Table 1. The example of bibliographic description in the BibTeX format. @INPROCEEDINGS{, author = {Hu, Xiaohua Tony and Cercone, Nick}, title = {Mining knowledge rules from databases: A rough set approach}, booktitle = {Proceedings of the 12th International Conference on Data Engineering}, conference = {International Conference on Data Engineering (CDE), New Orleans, USA}, pages = {96-105}, publisher = {IEEE Computer Society Press}, address = {Los Alamitos, CA, USA}, month = {February}, year = 1996, isbn = {0-8186-7240-4}, abstract = {In this paper, the principle and experimental results of an attribute- oriented rough set approach for knowledge discovery in databases are de- scribed. . . . }, keywords = {knowledge mining and discovery}, } 406 Z. Suraj, P. Grochowalski 5 System Requirements The RSDS system can be run on any computer that is connected to the Internet. The computer must have an operating system equipped with web browser. The presented above requirements must be met, as the RSDS system is an online system and requires a permanent connection to the Internet. In addition, a web browser, in which the system will be running, must support JavaScript scripting language, CSS style sheets and cookies. The system has been tested with the following browsers: Internet Explorer 9, Mozilla Firefox 17, and Chrome 23. 6 Plans for the Future The directions of further research and work related to the system will be: – development of a method for the formal verification of the correctness of the defined relations in the general ontology, – increase of the degree of efficiency of information retrieval, – automation of the process of processing of the owned information, – attempt to improve the quality of semantic analysis, – development of new functionalities of the system increasing its features such as: automatic discovery of scientific user profile, finding new data from In- ternet resources, extending the analysis of owned data, etc. Acknowledgment. We would like to thank everyone who contributed to the creation and development of the RSDS system, in particular to Grzegorz Świstak and Przemyslaw Wanat. References 1. Grochowalski P. and Pancerz K., The outline of an ontology for the rough set theory and its applications. In Czaja L., Penczek W., Salwicki A., Schlingloff H., Skowron A., Suraj Z., Lindemann G., Burkhard H.-D. (Eds.), Proceedings of the Work- shop on Concurrency, Specification and Programming, CS&P2008, Gross-Vater See, Berlin, 29 September - 1 October, 2008, vol. 1-3, pp. 192–204. Informatik- Berichte, 2008. 2. Grochowalski P. and Pancerz K., The outline of an ontology for the rough set theory and its applications. Fundamenta Informaticae, 93(1-3):143–154, 2009. 3. Pancerz K. and Grochowalski P., Matching Ontological Subgraphs to Concepts: a Preliminary Rough Set Approach. In Proceedings of the 10th International Confer- ence on Intelligent Systems Design and Applications (ISDA’2010), Cairo, Egypt, November 29 - December 1, 2010, pp. 1394–1399, IEEE Xplore, 2010. 4. Pawlak Z., Rough Sets. International Journal of Computer and Information Sci- ences, 11:341–356, 1982. 5. Pawlak Z., Grzymala-Busse J.W., Slowiński R., and Ziarko W., Rough Sets. Com- munications of the ACM, 38(11):88–95, November 1995. 6. Pawlak Z. and Skowron A., Rough Sets and Boolean Reasoning. Information Sciences, 177(1):41–73, 2007. About New Version of RSDS System 407 7. Polkowski L.T., Rough Sets. Mathematical Foundations. Advances in Soft Com- puting. Physica-Verlag, Heidelberg, 2002. 8. Skowron A. and Pal S.K. (Eds.), Special Volume: Rough Sets, Pattern Recognition and Data Mining, vol. 24(6), Pattern Recognition Letters. North Holland, 2003. 9. Suraj Z. and Grochowalski P., The Rough Sets Database System: An Overview. In Komorowski J., Grzymala-Busse J.W., Tsumoto S., Slowiński R. (Eds.), Pro- ceedings of the 4th International Conference on Rough Sets and Current Trends in Computing, RSCTC 2004, Uppsala, Sweden, June 2004, vol. 3066, Lecture Notes in Artificial Intelligent, pp. 841–849, Springer-Verlag, 2004. 10. Suraj Z. and Grochowalski P., The Rough Set Database System: An Overview. Transactions on Rough Sets III, Lecture Notes of Computer Sciences, vol. 3400, Springer-Verlag, Berlin, pp. 190–201, 2005. 11. Suraj Z. and Grochowalski P., Functional extension of the RSDS system. In Hirano S., Inuiguchi M., Miyamoto S., Nguyen H.S., Slowiński R., Greco S., Hata Y. (Eds.), Proceedings of the 5th International Conference on Rough Sets and Current Trends in Computing, RSCTC 2006, Kobe, Japan, November 2006, vol. 4259, Lecture Notes in Artificial Intelligent, pp. 786–795. Springer-Verlag, 2006. 12. Suraj Z. and Grochowalski P., Patterns of collaborations in rough set research. In Gomez V., Bello R., Falcon R. (Eds.), Proceedings of the International Symposium on Fuzzy and Rough Sets, ISFUROS 2006, Santa Clara, Cuba, December 5-8, 2006, pp. 1–7. 13. Suraj Z. and Grochowalski P., Patterns of Collaborations in Rough Set Research. In Bello R., Falcon R., Pedrycz W., Kacprzyk J. (Eds.), Granular Computing: at the Junction of Fuzzy Sets and Rough Sets, Studies in Fuzziness and Soft Computing, vol. 224, Springer-Verlag, 2008, pp. 79–92. 14. Suraj Z. and Grochowalski P., The Rough Set Database System. Transactions on Rough Sets VIII, Lecture Notes of Computer Sciences, vol. 3400, Springer-Verlag, pp. 307–331, 2008. 15. Suraj Z., Grochowalski P., Garwol K., and Pancerz K., Toward intelligent searching the rough set database system: an ontological approach. In Szczuka M., Czaja L. (Eds.), Proceedings of the CS&P’2009 Workshop, Kraków-Przegorzaly, 28-30 September, 2009, vol. 1-2, pp. 574–582, Warsaw University, 2009. 16. Suraj Z. and Grochowalski P., Some Comparative Analyses of Data in the RSDS System. In Yu J., Greco S., Lingras P., Wang G., Skowron A. (Eds.), Proceedings of the 5th International Conference on Rough Sets and Knowledge Technology, RSKT 2010, Beijing, China, October 15-17, 2010, Lecture Notes in Artificial Intelligence, vol. 6401, Springer-Verlag, pp. 8–15, 2010. 17. Suraj Z. and Grochowalski P., Toward intelligent searching the Rough Set Database System (RSDS): an ontological approach. Fundamenta Informaticae, 101(1-2):115– 123, 2010. 18. Suraj Z., Grochowalski P., and Lew L., Pawlak Collaboration Graph and Its Prop- erties. Proceedings of the 13th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFDGrC’2011), Moscow, Russia, June 27-30, 2011. 19. Suraj Z. and Grochowalski P., RoSetOn: The Open Project for Ontology of Rough Sets and Related Fields. In J.T. Yao et al. (Eds.), Proceedings of the 6th In- ternational Conference on Rough Sets and Knowledge Technology, RSKT 2011, Banff, Canada, October 9-12, 2011, Lecture Notes in Computer Science, vol. 6954, Springer-Verlag, pp. 414–419, 2011. 408 Z. Suraj, P. Grochowalski 20. Suraj Z., Grochowalski P., and Lew L, Discovering Patterns of Collaboration in Rough Set Research: Statistical and Graph-Theoretical Approach. In J.T. Yao et al. (Eds.), Proceedings of the 6th International Conference on Rough Sets and Knowledge Technology, RSKT 2011, Banff, Canada, October 9-12, 2011, Lecture Notes in Computer Science, vol. 6954, Springer-Verlag, pp. 238–247, 2011. 21. Suraj Z., Grochowalski P, and Pancerz K., Knowledge Representation and Au- tomated Methods of Searching for Information in Bibliographical Data Bases: A Rough Set Approach. In Skowron A., Suraj Z. (Eds.), Rough Sets and Intelligent Systems - Professor Zdzislaw Pawlak In Memoriam, Intelligent Systems Reference Library, vol. 43, pp. 515–538, 2012. 22. BibTeX. Available: http://www.bibtex.org/