=Paper=
{{Paper
|id=Vol-3179/Short_13.pdf
|storemode=property
|title=Semantic Profile of Corporate Web Resources
|pdfUrl=https://ceur-ws.org/Vol-3179/Short_13.pdf
|volume=Vol-3179
|authors=Viacheslav Zosimov,Oleksandra Bulgakova,Valeriy Pozdeev
|dblpUrl=https://dblp.org/rec/conf/iti2/ZosimovBP21
}}
==Semantic Profile of Corporate Web Resources==
Semantic Profile of Corporate Web Resources Viacheslav Zosimov, Oleksandra Bulgakova and Valeriy Pozdeev V.O. Sukhomlynsky National University of Mykolaiv, Nikolska 24, Mykolaiv, 54000, Ukraine Abstract The article presents a semantic profile of a corporate web resource, developed on the basis of the most common dictionary of semantic markup Schema.org. Based on the analysis of the structure and information content of 500 corporate sites, their general structure was compiled. This structure was compared with the schema.org ontology, missing classes were added to fully describe the developed structure of the corporate web resource. As a result, a general ontology of a corporate web resource was devel- oped. Keywords 1 semantic markup, search agents, intelligent information search, semantic web, corporate web site. 1. Introduction The development of the concept of the semantic web has become another evolutionary step in the development of the global network. The information posted on the Internet is easy for a person to understand. The semantic web was developed to make the information suitable for automatic analysis and synthesis of conclusions [1]. Despite the obvious advantages of using this technology, it has not become widespread in the web environment. Significant results have been achieved in the develop- ment of models of semantic markup of online stores as the main tool for e-commerce. Good Relations [2] has been used as a standard for micro-marking of e-commerce products since 2008, which pro- vides the ability to specify special properties for: companies - contact details, location, logo; store - address, opening hours, phone; specific product - product category, brief description, code, methods of payment and delivery, etc. At the same time, very little attention is paid to the electronic market for services, namely, struc- tural and semantic standards for the development of corporate web resources. Only a small percentage of web resources are developed using semantic markup standards. This situation is a consequence of the problems of practical implementation, existing from the very beginning of the semantic web con- cept, and the peculiarities of the web resources development market: 1. Lack of publicly available means of viewing and direct use of information provided by web re- sources in the Semantic Network. Existing projects differ and do not go beyond research departments [3]. 2. The visibility of specific standards for the semantic design of corporate web resources, that is, the visibility of the tools in the development of web resources with integrated semantic design. 3. The visibility of web developers will devote an hour to mastering new technologies with a glance at the visibility of tools for the interaction of the user with this technology. It’s simpler, seemingly dumb to the senses of the developers, it takes an hour to root web resources from the semantic layout of the schema.org standards, but there’s no practical tool to use any kind of tools in order to use the keys to correct the information. The Google company has provided tools for micro-formatting of web Information Technology and Implementation (IT&I-2021), December 01–03, 2021, Kyiv, Ukraine EMAIL: zosimovvv@gmail.com (V. Zosimov); sashabulgakova2@gmail.com (O. Bulgukova); pozdeev1405@gmail.com (V. Pozdeev) ORCID: 0000-0003-0824-4168 (V. Zosimov); 0000-0002-6587-8573 (O. Bulgukova); 0000-0003-1224-7329 (V. Pozdeev) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 389 resources, and there are only a few semantic tags to add to it, and it will only allow the thematic di- rectness of the web resource and the visualization of contact data [4]. 4. Development of a web resource based on the schema.org semantic layout standard is a non- trivial task and an extra hour of development for a project in a design. Apparently, it is significant that the development of the web resource for the deputy is growing. Exciting about the provision of fur- nishings, the distributors must complete the important arrangement from the economic point of view, the need to increase the budget for the development of the project in the design of the development. The advent of semantic markup standards opens up broad prospects for the use of intelligent data search tools on the Internet - search agents. According to Stephen Haag's classification [5], these are data mining and analysis agents working in data warehouses. In the development of metasearch en- gines, search agents are used to analyze and process the results of the search engine. This direction is developing rather slowly due to the lack of search engine development tools available to a wide range of users. To ensure the effective operation of the technology it is necessary to explore the content of corporate web resources, build their structure and semantic profile to describe the built structure. 2. Problem Statement The use of semantic markup is the most effective way to adapt the information presented on web resources to machine processing. The most common and largest dictionary of semantic markup is Scema.org. It has 792 classes and 1447 properties [6]. On the one hand, the large number of presented classes makes this dictionary a universal tool for marking different types of web resources, but on the other hand, it significantly complicates the process of marking small corporate web resources due to the need to study the whole hierarchy of classes, internal interclass connections and properties. In recent years, the trend of using semantic dictionaries has changed somewhat. As with the devel- opment and addition of new classes, the practical use of the general dictionary schema.org has be- come quite time-consuming, mechanisms have been developed for the integration and use of special- ized dictionaries based on Schema.org [7]. Specialized ontologies are developed for a more detailed description of data from a specific industry. There are two types of specialized dictionaries: 1. Internal - published as part of the schema.org project with its own structure, which is usually ser- viced by a separate project team. 2. External - exist separately from the main dictionary. Developed and maintained by external organi- zations not affiliated with schema.org, however, they are important elements in expanding the overall structure of the basic vocabulary. In the future, external ontologies are integrated into the system of classes of the basic dictionary. The integration of external dictionaries takes place in several stages. It is first reviewed in the schema.org community. In case of a positive assessment, a working group is formed to further coor- dinate the process. From the above it follows that the task of developing a specialized ontology for corporate web re- sources, which includes the following steps: 1. Analysis of the corporate web resources structure and information content. 2. Construction of the corporate web resource general structure. 3. Comparison of the obtained structure with the schema.org ontology. 4. Add new classes to describe data that is not in the schema.org dictionary. 5. Construction of the corporate web resource general ontology. 3. Research of web resources navigation elements Purpose of the research: determination of the general structure of corporate web resources based on the analysis of parts of the top level of the main navigation menu. The structure of 500 corporate web resources was examined. The analysis of web resources and data mining was carried out using an automatic parser implemented by means of the WDOL (Web data operating language) language [8]. Web resources for analysis were automatically selected from Google search results for “Our com- pany”. This wording of the search query obviously provides a high probability of having in the search results of corporate sites from different industries. Only the main page of web resources was analyzed. 390 A total of 584 web resources were processed, including re-links to web resources, bulletin boards, and service aggregator sites. Navigation menu items in the form of “Item name” → “Links” were extracted from each web re- source and stored in the database. Table 1 presents a list of structural elements sorted by decreasing frequency of their occurrences to web resources. A total of 2371 unique navigation elements were obtained. Items with less than 30 occurrences were ignored as uninformative. These are specific to some companies: - elements with reference to specific services, which in general are elements of the second level for the root “Services”; - elements with reference to specific articles on the company's activities, which in general are the elements of the second level for the root “Articles” or “About the company”; Screened out as uninformative - 2326. The list of informative structural elements is presented in table 1 in descending order of the total number of occurrences. Table 1 List of structural elements of corporate web resources Number of Number of № Item name № Item name occurrences occurrences 1 Contacts 891 24 Awards 84 2 About the company 637 25 Partners 81 3 Services 611 26 Clients about us 78 4 News 363 27 FAQ 75 5 Product 232 28 Projects 74 6 Vacancies 217 29 Dealers 72 7 About us 193 30 Documents 72 8 Cooperation 185 31 Our facilities 71 9 Certificates 177 32 Promotions 68 10 Information 169 33 Terms of cooperation 65 11 Service 156 34 For dealers 63 12 Reviews 146 35 Our contacts 62 13 Home 142 36 Events 60 14 Career 138 37 Distinctions 59 15 Articles 125 38 For partners 57 16 Press center 124 39 Clients 56 17 Licenses 102 40 Our projects 56 18 Our services 98 41 Our company 54 19 Our works 97 42 Our partners 53 20 Our products 93 43 Production 52 21 Objects 93 44 Gallery of works 51 22 Our clients 87 45 Our news 50 23 Shop 86 The next step in manual mode was to review 50 randomly selected web resources from the previ- ous sample to study the information content of the basic structural elements. “Our company” contains information about the company, activities, history, etc. 72% partially du- plicates the information posted on the main page. Therefore, it is advisable to combine these sections. “Projects” contains a list of realized orders in the form of a portfolio with the name, image and characteristics. It is advisable to combine with the sections “Our clients”, “Our objects”, “Gallery of works”, as close in meaning. “Documents” contains a list of documents certifying the legality of the company, samples of doc- uments to be filled out by the client, contracts. “Certificates”, “Licenses”, “Distinctions”– list of documents with a description. 391 The sections “Documents”, “Certificates”, “Licenses”, “Distinctions” should be combined as close in meaning. “Reviews” contains visitor reviews. It is important to note that in 17% of web resources with pub- lished user reviews, there is no possibility to post a review. This casts doubt on the reality of the re- views. When analyzing web resources and making decisions, you can take into account only those reviews whose owners have confirmed their identity by logging in to the web resource, or specialized third-party services. “Vacancies” contains a list of vacancies, a description of career opportunities, and general infor- mation about employment. “Cooperation” contains information about the terms of cooperation. “Our partners” contains a list of partners, in most cases in the form of a list of logos. It is advisable to combine the sections “Cooperation” and “Our Partners”, “Dealers”, as close in meaning. “Contacts” contains the necessary contact information for the user, the map. You should also move all types of contact forms to this section, such as Feedback, Call Meter, Callback, Table Ordering, and more. Such forms significantly overload the pages of the web resource and are often very intrusive. “Services” contains a list of services provided by the company, if necessary, divided into types: design, installation, service, etc. “Product” contains information about the presented products. “News” – this section is used “for its intended purpose”, namely to cover news about the compa- ny's activities, only by representatives of big business. This is due to the fact that they have a large staff, large production, regional offices, and large-scale enterprises generate a large number of events that can be covered. On the web resources of small and private companies, the News section is in most cases filled with general news, which is automatically downloaded from news resources, or is left blank altogether. Section “Articles”. Based on statistics from the website of the American Webmasters Association (AWA) [9], owners of web resources fill this section with useful informational articles about the company's activities, services, products, rules of use, only in 12% of cases. Another 46% order copy- writing services to fill the section with SEO-articles to increase the artificial ranking in the results of search engines. 23% contain slightly revised copies of existing articles from other resources. The last 19% remain empty.In 31% of web resources viewed, this section contains no more than two articles placed shortly after the creation of the web resource, or empty at all. SEO-articles were present in 52%. They are characterized by: a large number of keywords highlighted in bold; the availability of general information on the topic, but without details, as they are not written by specialists in the subject area; a small volume, about 1000-1500 characters (200-250 words), which is sufficient for indexing by a search engine. 19% have large, detailed articles that deeply cover the topic. Often, brief news and news articles are placed in the left or right side of the web resource with links to the full text of the articles. This provides better indexing of the article section by search en- gines, and also creates the illusion that the content of the pages is frequently updated, which is one of the positive factors for increasing the ranking of a web resource in search results, but prevents the user from perceiving the main information on the page, overloading it with unnecessary information. Given the relatively low percentage of using the sections “News” and “Articles” to place useful in- formation for users, it is advisable to combine these elements into one, as well as to add sections “Promotions”, “Events”, “Information”, as close in meaning. If you overload the main menu, you can make it a sub-item in the section “Our Company”. As a result of the study, the structure of the upper level elements was formed. Similar structural elements were grouped into thematic groups for display. The general structure of corporate web resources was built based on the results of the experiment, as well as the results of research by leading experts in the field of web design, and the usability of web resources [10]. According to the recommendations, the number of root elements of the main naviga- tion should not exceed 8. The constructed structure includes 10 elements, but it should be noted that none of the studied web resources had all these elements together. The maximum number of them was 392 9. The general scheme includes all possible options, and when developing each specific project, only those that meet the company's requirements will be selected. Table 2 presents the groups of structural elements and the total frequency of occurrence of all elements of the group. Table 2. Groups of structural elements of Web resources Total Group of structural elements frequency of occurrence Articles, Information, FAQ, News, Our news, Press center, Promotions, Events 1041 Home, About us, About the company, Our company 1026 Contacts, Our contacts 953 Services, Our services, Service 865 Our clients, Clients, Objects, Projects, Our works, Our projects, Our objects, Gallery of 585 works Partners, Dealers, Cooperation, Terms of cooperation, For partners, For dealers, Our part- 577 ners Documents, Licenses, Certificates, Awards, Distinctions 494 Products, Our products, Production, shop, goods 463 Vacancies, Careers 355 Feedback, Customers about us 220 The number of occurrences, which exceeds the total number of analyzed web resources, due to the fact that on some web resources the main navigation is duplicated in the footer of the web resource. The algorithm for extracting navigation elements takes into account such items as individual. The data in table 2 were used as structural groups, to which were added elements close in meaning to ensure the ease of navigation on the web resource. For example, our business, team, employees, representa- tive offices, branches, sources of inspiration, company history, company development, etc. were add- ed to the group “About the company”. These structural elements are highlighted as second-level ele- ments for the root “about the company”. This organization of the navigation bar reduces the overload of the web page with first-level navigation elements and builds an intuitive navigation system for the user. The next step of the study is to build a semantic profile of the corporate web resource. 4. Semantic profile of corporate web resources The standards of semantic structure are called dictionaries of micromarking [10], which are de- scribed in [11-19]. The general structure of corporate web resources has become the basis for building a semantic profile. The construction of the semantic profile of corporate web resources took place in two stages: 1. Comparison of the general structure with the schema.org ontology, as a result of which the list of necessary classes for the description of structure was allocated. 2. Adding new semantic classes to describe those elements of the structure for which there are no cor- responding classes in the schema.org ontology. To implement the semantic profile of the corporate web resource, a number of classes were created according to the structural elements of the first level, as well as one base class, which contains all the new properties needed to describe the structure of corporate web resources. For unique identification of new classes, before their name the prefix сw (corporate website) is added, and for classes sche- ma.org, the prefix sc (schema). In figure 1 presents a UML-diagram of the classes of the developed semantic model of the corporate web resource. Properties common to each class are marked with a gray background and for ease of perception in all classes except the first, replaced by «…». Classes correspond to the structural elements of the first level. Each class inherits from the sche- ma.org ontology and the cw: corparateWebsite base class, two groups of properties: 393 1. Common to all classes. sc: Thing: — name; — image; — description; — URL. sc: WebPage: — mainContentOfPage; — primaryImageOfPage. cw: CorporateWebsite: — keyNote - the most important part of the page content, keywords, the main idea that can be used to improve the quality of information retrieval; — announcement - a brief announcement of the information presented on the page, usually used on a page with a list of news, articles, vacancies, reviews, etc. The first paragraph or several sentences of the main text are most often noticed as an announcement. 2. Specific to a particular class. Each class has its own specific set of properties that describes the information content of pages of this type. The new class cw: CorporateWebsite has the following properties: — keyNote; — advantage. Some property that favorably distinguishes an object or service from others. Can be used to describe projects or services. It is also used as one of the components of the feedback system. As a rule, the benefits are indicated by a list; — disadvantage. Mainly used as one of the components of the feedback system; — projectTimeline. There are two options for using this property: simple - the number of working days required to complete the project, or folded, divided into stages of execution; — terms. Used, for example, to describe the necessary conditions for obtaining the status of a deal- er, or the conditions for successful project implementation; — sertificate. An official document issued by the competent authorities certifying, for example, the quality of the products presented; — license. An official document issued by the competent authorities certifying, for example, the right to provide a certain type of service; — announcement. The constructed block diagram and semantic profile of corporate web resources are the basis for the development of the following elements of the CODI system [20]: 1. Specialized content management system for corporate web resources with integrated semantic markup. 2. Module for displaying the content of web resources based on custom templates. 3. Metasearch system based on search engine processing of popular search engines with the use of search agents and the ability to display search results based on user templates. 4. A personalized user's web page that displays user-relevant information that is automatically re- trieved and aggregated from various sources by search engines. 5. Perspectives of application of web resources semantic profile Developing separate semantic profiles for different types of web resources, instead of using one large ontology, has the following advantages: 1. For developers of semantic profile: — division of a large task into smaller ones in terms of volume and complexity - separate project groups for the development of their areas; — the possibility of rapid development of promising areas due to the concentration on solving a small task of developing a separate semantic profile; — ability to add new, specific to each type of web resources, classes and properties. Within one glob- al ontology, the addition of industry-specific classes leads to a rapid growth of the structure and a significant increase in the complexity of its practical use. 394 Figure 1. UML-diagram of classes of corporate web resource semantic profile 395 2. For web resource developers and network users: — Ease of use. If semantic classes are tied to specific structural elements of a web resource, their use becomes more convenient and transparent. When using one global ontology, developers are forced to adapt general purpose classes to the specifics of specific structural elements and often there is a situation when such a possibility is simply absent; — Accessibility for new users due to a significant reduction in time to study classes and properties. The practical implementation of the semantic web concept requires the solution of two urgent problems: 1. Integration of semantic markup to existing web resources. 2. Development of new web resources with integrated semantic markup. Solving these problems requires the development of a web resource markup strategy, which in- cludes: — selection of elements that need to be marked; — selection of the standard according to which the marking will be carried out; — choice of automatic or manual approach to markup integration; — selection of micro-markup code generation tools; — choice of method and tools for integrating the generated code into the web resource HTML-code. A significant factor hindering the widespread use of semantic markup is the lack of comprehensive solutions that provide all the necessary tools to solve the problems described above. To successfully solve the problem of integrating semantic markup into the HTML of new and ex- isting web pages, it is necessary to analyze existing approaches and methods. 6. Conclusions The development of the semantic web concept has become another evolutionary step in the devel- opment of the global network. The integration of semantic markup into the HTML-code of web pages creates the conditions for the application of methods of machine processing of information placed on them. This in turn opens up opportunities for the development of intelligent data retrieval methods, as well as methods of displaying data based on the identification of information by semantic attributes. Based on the study of the navigation menu and information content of corporate web resources, their general structure was built, which became the basis for creating a semantic profile. An approach to the practical implementation of the semantic web concept as a necessary condition for the development of e-commerce is presented. It is to develop separate semantic profiles to de- scribe the information content of different types of web resources instead of adapting the global on- tology schema.org. As part of the development of the semantic web concept, based on the corporate web resources structure was developed their semantic profile. It is a set of classes to describe the in- formation content of corporate web resources. In the future, the presence of a semantic profile and custom templates for displaying content allows you to change the concept of a web resource. It is no longer a standalone site with its own strictly de- fined design, which is displayed when you refer to a domain name. The new approach defines a web resource as a set of data and a semantic profile compiled according to certain rules. The use of a se- mantic profile allows you to display web data in any user-friendly form, based on his personal display template for the appropriate type of web resources. The user gets the opportunity to arbitrarily change the web resource structure, choose which elements of the web page will be displayed and which will be ignored. Also, the semantic profile allows you to operate on data outside the domain name, for example, to compare directly at the search stage of certain services or goods, apply filters and sorting. 7. References [1] Semantic web. [Online]. Available: https://www.w3.org/standards/semanticweb. [2] Good Relations. [Online]. Available: https://www.goodrelations.co.uk. [3] Web page semantic markup. [Online]. Available: 396 https://support.google.com/merchants/answer/6069143. [4] Structured Data Markup Wizard. [Online]. Available: https://www.google.com/webmasters/markup-helper/u/0/. [5] Haag S. Management Information Systems for the Information Age: Ninth Edition. McGraw-Hill Higher Education, 554 р., 2012. [6] Schema.org semantic dictionary classes. [Online]. Available: https://schema.org/docs/about.html/. [7] Integration of semantic dictionaries into the schema.org environment. [Online]. Available: https://schema.org/docs/about.html#cgsg [8] Zosimov, V., Bulgakova, O. Development of Domain-Specific Language for Data Processing on the Internet International Scientific and Technical Conference on Computer Sciences and Infor- mation Technologies (2020), 287–290, https://doi.org/10.1109/CSIT49958.2020.9321968. [9] Use of articles on web resources. [Online]. Available: https://www.aawebmasters.com/ecommerce/ [10] Lawrence D., Tavakol S. Balanced Website Design: Optimising Aesthetics, Usability and Purpose. Springer Science & Business Media, 236 р., 2016. [11] Zosimov, V., Bulgakova, O. Application of Personalized Ranking Models Based on Expert Evaluations for Sorting Goods on E-commerce Web Resources. International Scientific and Technical Conference on Computer Sciences and Information Technologies, P. 42–45, 2020. https://doi.org/10.1109/CSIT49958.2020.9321902 [12] Open Graph. [Online]. Available: https://ogp.me/ [13] Friend of a Friend (FOAF): an experimental linked information system. [Online]. Availa- ble: http://www.foaf-project.org/ [14] Dublin Core. [Online]. Available: https://dublincore.org/ [15] Work on optimization of extended snippets. [Online]. Available: http://astra.red/rabota-po- optimizatsii- rasshirennyih-snippetov/ [16] Internet Live Stats online statistics service. [Online]. Available: https://www.internetlivestats.com [17] Sociological research of problems of introduction of micromarking. [Online]. Available: https://www.schemaapp.com [18] Kosara T., Bohrab S., Mernika M. Domain-Specific Languages: A Systematic Mapping Study. Information and Software Technology, vol 71, pp. 77-91, March 2016. [19] Diagram of classes of the dictionary of semantic markup Good Relations. [Online]. Availa- ble: http://www.heppnetz.de/ontologies/goodrelations/20100412/v1.html [20]Zosimov, V., Bulgakova, O., Pozdeev, V. Complex internet data management system. Ad- vances in Intelligent Systems and Computing, 2021, 1246 AISC, P. 639–652. https://doi.org/ 10.1007/978-3-030- 54215-3_41 397