=Paper=
{{Paper
|id=Vol-2823/Paper13
|storemode=property
|title=Comprehensive Study of Semantic Annotation: Variant and Praxis
|pdfUrl=https://ceur-ws.org/Vol-2823/Paper13.pdf
|volume=Vol-2823
|authors=Sumit Sharma, Sarika Jain
}}
==Comprehensive Study of Semantic Annotation: Variant and Praxis==
Comprehensive Study of Semantic Annotation: Variant and Praxis Sumit Sharma, Sarika Jain Department of Computer Applications, National Institute of Technology, Kurukshetra, Haryana, India Abstract The proliferation of web content on the Internet has increased the demand for efficient information retrieval independent of content. The concept of the semantic web has revolutionized the way of searching, analyzing, and storage. Besides, semantic annotations provide esteemed solutions to enrich target information. There is a large amount of research available in the area of semantic annotations, which highlights the significance of annotation (such as sharing, integration, creation, and reuse, so forth) in various domains using annotation tools, be that as it may, none of these tools gives the earlier practice of the annotation research questions. Besides, no unified system exists that combines all the different kinds of annotations. This work presents a way to address the research questions given in the paper. We have combined isoforms of various types of annotations which have not been done to our knowledge till now. Furthermore, we have highlighted some prominent semantic annotation tools with their real-life applications, which depend on the type of annotation we classify. Keywords Semantic Annotation, Challenges, Applications, Ontology 1. Introduction plied semantic annotation on digital music to improve the trend of searching music [5]. Thus, annotation can Information sharing and searching is a more useful be termed as to reduce the mental effort when a docu- task for Internet users but they are facing difficulties ment is read for the purpose of research and analysis. due to different representations of different data sources. Therefore, the process of embedding additional infor- Semantic annotation modeling can fill this gap of vari- mation to the already available information helps to ous knowledge representations. It establishes the rela- interpret the information, remembering things, trace- tionship between the data entities and joins the term ability, machine understanding capability, and many or mentions to entities. The objective of the seman- more. tic annotation measure is to survey what parts of the Another assumption about semantic annotation is report compare the ideas portrayed in the ontology, to use a machine to understand the relationship be- and along these lines, the outcome is a bunch of map- tween the URI and the network of data. If the text pings between record sections and ontology concepts is semantically marked, then it becomes a source of as defined in [1]. Natural language technologies are learning which is easy to understand, consolidate and one of the emerging trends of their use for the sciences reuse by machines. Semantic annotation helps ma- and humanities. Experts are facing problems such as chines to use data on the web to self-interpret, com- the explosion of information due to the continuous in- bine results, and manage digital information from in- crease in the production of scientific content on the formation available on the internet. Such information web, which makes it difficult to observe the state of the can be generated by interpreting sources from meta- art in a given domain [2]. Semantic annotation appli- data that can result in "annotations" about all resources. cations have been used in different domains in differ- In this paper, we shall examine semantic annotation ent ways, but all of these have a common goal. Authors by defining the annotation and metadata, and then we have applied the semantic annotation for the Arabic shall discuss various aspects of semantic annotation web document by deep learning methods [3]. Anno- approaches and review the current generation of se- tations can also contribute to manage natural history mantic annotation systems. collections using semantic annotation [4]. Authors ap- Here in this paper, we are preparing and address- ACI’21: Workshop on Advances in Computational Intelligence at ISIC ing some research questions, which are benignant and 2021, February 25-27, 2021, Delhi, India significant for the research development of meanings " sharma24h@gmail.com (S. Sharma); jasarika@nitkkr.ac.in (S. and annotations. We have described isoforms of vari- Jain) ous kinds of annotations with a formal description of ~ https://sites.google.com/view/nitkkrsarikajain/home (S. Jain) 0000-0001-5054-8670 (S. Sharma); 0000-0002-7432-8506 (S. Jain) the semantic annotation to a nexus between research © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). questions. We are also going to explain essential as- CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) pects of semantic annotations that are being used for Semantic annotations represent transitional formu- diversity of semantic annotations related to different lation of connections between unstructured documents, domains. Furthermore, we have highlighted some exi- semi-structured documents, and ontologies in both di- gent semantic annotation tools alongside their real-life rections [9]. Embedding metadata with the documents applications, which depend on the type of annotation to assign semantics on the web assets is a semantic an- we classify. notation by innovative judgment [10]. All the above definitions provided by various authors have one thing in common: linking resources with domain ontology. 2. Research Questions Here in this section, we provide a brief study on se- 2.2. Why? (Purpose) mantic annotation to elaborate major research ques- This research question is the most important research tions "what, where, why, and how" to use the semantic question to solve the significance of the development annotation. "What?", describe the definition of anno- of annotation. The textual data’s growing phenomenon tation, "Where?", examine where to apply, "Why?", de- requires Natural language processing and text min- fine the importance of annotation and "How?", define ing procedures to arrange and recognize patterns and various ways to represent annotation. knowledge from the texts. The need for semantic an- notation is becoming important because the informa- 2.1. What? (Definition) tion is represented as a knowledge graph [1]. Data is regularly traded in an electronic arrangement (like pa- According to Oxford Dictionary Online, the sound “an- pers, letters, note amalgamation, mail, data set, report, notation” is defined as “a note by way of explanation laws, proposals, articles, and declarations). The pur- or comment added to a text or diagram”[6]. Semantic pose of semantic annotation encourages the semantic annotation contributes to mark-up the existing texts web-enabled machines to self-interpret, consolidate the to justify their senses so that a machine can automat- results, and practice it on the web. We can create such ically identify and process information, thus making information by annotating sources using metadata, out- them more valuable. In literature, the definitions as coming in "annotations" concerning that source. employed by different authors for semantic annota- Probing, searching, mining, and classifying are grow- tions were quite different. In any case, Semantic Web ing significant and challenging jobs with extensive mas- achievement relies upon the accomplishment of an ex- sive data. This job grows even more complicated if traordinary number of clients semantic substance. This the data explode, and the data are undefined. It is not accomplishment requires apparatuses that decrease the straightforward to manually read all the documents multifaceted nature of semantic innovations. Seman- and find a particular concept (person, event, place, so tic annotation is the fitting procedure for searching a on) in the full document. Annotation provides a sig- word, sentence, and paragraph semantically in the Se- nificant role in the search for any key idea in the docu- mantic Web. Annotations are also used to transform ments. It is challenging and essential to discover all the syntactic structures into knowledge structures. key concepts and relationships in the documents dur- All the more succinctly, annotation or tagging is a ing annotation. Exploring the relationship between process that allows to draft a section, statement, com- data concepts and rendering it in a new form is again ment, or attributes to a document or segment in a re- the discovery subject. Ontology is the right way to port. When all is done, the annotation can be viewed define the relationship between data concepts, and si- as additional data related to a specific point in one multaneously, it provides advantages to data in the record or another snippet of data [7]. The authors [8] form of machine understanding. present an overall meaning of annotation as includ- ing some other bit of information and further expand- ing the definition of annotation in various domains. 2.3. Where? (Place) Generally, domain annotations are typically labeling In the last few decades, the experimental form of an- a concept (record, part of an archive, or word) legit- notation has grown a lot. We have found the usabil- imately to perceive the essential concept or principle ity of annotation in various fields. There are extraor- thought in the information. The tagging helps users dinary implications and uses in various areas of an- to recognize or classify a document based on the con- notation. Programming languages use annotation on cepts required and also helps to target the outcome of class, method, parameters, or variables for their clar- the document [7]. ity and definition. On the other hand, mechanical en- gineering uses annotations to understand the specific meanings of text or symbols. Before the utilization of annotation, it is valuable to think about the scope of annotation that exists so that anyone can pick the cor- rect type for their use case. Annotations incorporate a broad scope of data types on which it tends to be ap- plied and reuse. Some essential ranges of annotation include text, image, audio, video, graphics [2]. Some annotation tools have evolved to show the use cases Figure 1: Various aspects of Semantic Annotation of annotations that provide a lightweight framework to annotate textual data [11]. The authors [12, 13, 14] applied semantic annotation on image data to improve libraries that need text mining[20], AI, and natural lan- the searching. Likewise,[15] provided a methodology guage handling techniques to get meaningful informa- to add an annotation to XML schemas same as[5] ap- tion. As per our knowledge, the most popular machine ply annotation on digital music. Semantic annotations understandable format nowadays is RDF (W3C, Re- are useful for digital document classification (newspa- source Description Framework (RDF) http://www.w3.org/RDF/. pers, blogs, media content filtering). It is possible to Last accessed January 25, 2021.). search for a particular concept (named entity recog- nition) in large amounts of data. Annotation has an essential role in the biomedical field to identify essen- 3. Preliminaries for Semantic tial terms used in medicine [16]. Currently, IoT sen- Annotation sor data are being stored by meaningful annotation to clearly express the powerful potential and impact of In our studies, various aspects of semantic annotations the data [17, 18]. are shown in Figure 1, which completes the survey of semantic annotations. This section is important to 2.4. How? (Implement) know the structure of annotation, here we shall be- gin with the basics to describe the complete method of This is the most important research question that plays practicing annotation. Then, based on the structure of an imperative role in the success of annotations. Also, data types in which semantic annotations addressing the applicability of the semantic annotation depends the research question and then provide a formal defi- on the nature of the data type. It can be text, image, au- nition of semantic annotations to serve the purpose of dio, video. For the text annotation, it could be Seman- annotations. tic Annotation, Intent Annotation, and Named Entity Annotation. Finding the essential concept in the text 3.1. Semantic Annotation is the main work for a text document. Image annota- tion is essential for an extensive scope of utilizations, Annotation is the process of allocating some labels to including PC vision, automated vision, facial acknowl- the data for data interpretation and automatic descrip- edgment, and arrangements that depend on AI to de- tion. Semantic annotation is the annotation in which cipher pictures. To prepare these arrangements, meta- some necessary additional information is added to a data should be doled out to the pictures as identifiers, text document to reflect the relationship between on- inscriptions, or catchphrases. tology class concepts or instances and text document In the last few decades, several techniques were de- entities. This brief description of the object defined veloped for semantic annotation. The part of speech consists of the main body of the paper. It describes (POS) annotations depends on the specific design and semantic for a document (such as label, title, author, model demanded. One may be interested in a limited date of publication, etc.). Therefore, semantic annota- POS annotation scheme if one wishes to do text min- tion collects semantic information from intuitive and ing or text processing. Semantic comment stages offer more essential records so that target information can help for data extraction advancement, knowledgebase be easily searched and classified by the machine. and ontology executives, warehouse, access APIs (e.g., The annotation output of a document can be in dif- RDF repositories), and UIs for knowledgebase editors ferent forms and depends upon the tools or methods and ontology [19]. The semantic annotation is like- that produce annotation. The goal of the annotation wise helpful for a legitimate grouping of e-reports, on- project may differ according to the design and require- line news, web journals, messages, and computerized ment of the project model. Figure 2 shows an example Table 1 Example of structured data S.N Account No Name Transaction Logs 1. XXX445050 Ram 4393949 5:5:19 2. PPP304039 Laxman 2932734 3:4:19 Figure 2: Example of semantic annotation annotation of email text data annotated by ontology. Figure 3: Example of unstructured email message data 3.2. Types of data In the present scenario, annotation is one of the most on the relationship defined in the data model. Struc- challenging tasks as data on the web is not uniform tured data can be handled by humans as well as by (different structures). Semantic annotations can be ap- machine. However, human has less role in the annota- plied keeping in mind the nature of the data. There- tion and structured data are easy to annotate by some fore, it is essential to provide a unique description of predefined rule [21]. These rules are created based on the data to make the data different and to avoid anno- the relationship between the entities. tation problems. Motivated from this, in this section, we will throw light on various types of data (on the internet) which is significant to semantic annotation. 3.2.2. Unstructured Data There are three kinds of data namely; structured, un- Several authors have worked on the other form of un- structured, and semi-structured, which are explained structured data like (images, audio, video, news, social below. media data, blogs, open-ended survey, web content, transcripts, etc.). Various AI and Machine learning- 3.2.1. Structured Data based algorithms have been applied to recognize the content and then annotate them accordingly. It also A well-Organized form of data is known as structured provides a hidden association between the entities with data, which is easy to explore and generally arranged the help of links. The wide range of data on the In- in rows and columns (e.g., excel spreadsheet). In the ternet is unstructured data. Generally, heterogeneous structured data, a portion of the information always data cannot be stored as a row and column and does periodically outlines into fixed predefined attributes, not have an associated data model. The general ex- which is occurred in form of the columns. For instance, amples of unstructured data are web email, blogs, and Table 1 shows the structured form of the transaction HTML pages. Since there is no underlying relationship log in which the excel spreadsheet, database designer between the data concepts, therefore, finding, analyz- designs a data model that is followed to store the struc- ing, accessing, and managing a piece of information ture data. This is the best example of structured data. in this kind of data is more complicated. According to This data model saves all records into a table. These some machine learning algorithms [22, 23, 24], these records are collected with the help of a relationship processes are erroneous and time-consuming tasks. Fig- that exists between the entities of data. Structured ure 3 shows an example of unstructured data. data utilize the storage space and make information retrieval as easy as possible. SQL, MySQL, and SPARQL are the query languages 3.2.3. Semi-structured Data used to retrieve, manipulating, and storing the struc- Semi-structured data is another variety of data that ture data. These query language groups the database mix the structured and unstructured data. It has re- Figure 5: Manual annotation of document with semantic data Figure 4: Example of semi-structured data on web search of paper a type of formal annotation with human computer in- teraction. It tracks many NLP tasks and has lots of activities [26] such as writing comprehensive anno- markable properties to organized information but does tation guidelines and defining an annotation schema, not relate to the fixed structure of the data model. Web etc. Manual annotation is even more conveniently de- forums, web pages, and email messages are the popu- veloped today, utilizing writing tools, such as Seman- lar examples of semi-structured data in which, the ac- tic Word [27], which give an incorporated atmosphere tual content is unstructured, and this form of data also to authoring and annotating text. Notwithstanding, contain some structured information such as name and human annotators’ utilization is as often as possible title, log information, time, etc. due to components, for example, annotator knowledge Figure 4 shows an example of semi-structured data of the domain, a measured amount of training, per- about a web page. That also contains some structured sonal inspiration, and complex patterns. Manual an- information about the web page like title, journal name, notation cannot be applied to a massive portion of data. journal log, etc. This semi-structured data provides The semantic annotation of archives concerning an on- a little help to the designer to build the data model. tology and an entity knowledge base is examined in These small pieces of information involve extracting [15]. Even though introducing intriguing and yearn- data from the unstructured repositories. ing draws near, these do not talk about the utiliza- tion of robotic strategies. The center is the manual 3.3. Level of Automation of Annotation semantic annotation for the enrichment of web con- tent, while few cutting-edge manual annotation ap- Successful use of the Semantic Web requires far reach- proaches are examined regarding difficulties of sup- ing accessibility of semantic annotations for existing porting multiple formats (HTML toward PDF, XML, and new records on the Web. The level of automation images (e.g., PNG, JPEG), and video. For a depiction shows how we can get the right data and how to use of some more established tools or frameworks, please it correctly. It defines the automaticity of the machine allude [28]. The authors also provide a classification of from manual to automatic. The level of automation semantic annotation system detailed analysis of end- in any systems can be assessed, measured as manual, user tools, pros, and their cons. automatic, and semi automatic described in [7, 9, 25] The manual annotation tools allow humans to add with their framework and requirements. some description of text to web contents or the other sources of data. However manual annotation has be- 3.3.1. Manual Annotation: come very complicated because of its usability and fea- Manual annotation is a process of reading an input ture [29]. Protégé [29], SMOR[30]E, and OntoMat [31]. document and extracting a piece of new information The author[26], have provided the list of annotation with human participation. Manual annotation is also tools based on the detailed evaluation of annotation feature Besides this, manual annotation is time con- suming and often full of errors. As shown in the Fig- ure 5, it requires expert knowledge for being domain- specific. For manual annotation, a large volume of training is needed. Due to the complex schemas, it is also not easy to handle large-scale data, and there is no reuse of output data. Human annotation is too costly and time consuming and cannot be applied to control the massive amount of records available on the Web. Manual annotation requires qualified annotators, this has been explained with the help of an example in Figure 6: Semi-automatic annotation process of documents section 4, and first, an annotator would map the text with semantic data “Ram” to domain ontology and recognize it as a Person and further would recognize the company, where Ram is working. Based on tagging of the data, manual an- ments and annotation. Semi-automatic annotation re- notation is further categorized as formal and descrip- quires a mixed structure in the annotation model that tive annotations. has increased the structure complexity [25]. This kind of annotation model is fit for supporting labels or tags • Formal Annotation that are not related to a specific property but on the Formal annotation is the simplest and fastest way other hand are portrayed to depict a particular connec- to annotate documents by the human. In the for- tion among metadata assets for navigation purposes mal annotation, some scripts are added to the seen at [9]. record such as (title, author, publishing date, etc.). The semi-automatic annotation is shown in Figure 6, To do such a task, experts do not require detailed in which both human and machine become the anno- knowledge about the domain, only conceptual tators. Semi-automatic is fast and robust to find the se- understanding is needed. mantic relationship between the annotating data and • Descriptive Annotation the targeted annotated document. Human enrollment A descriptive annotation or summative annota- provides a significant advantage to semi-automatic an- tion can describe the main goal of the work. De- notation to adopt the new feature and new domain. scriptive annotation provides a summary as well Morphological analysis, part-of-speech tagging, retrieval as a complete citation of the job without eval- of domain-specific information, and recognition of name uating the quality of work. Descriptive anno- entities are the significant component of semi-automatic tations include an overall description of objects annotation. that may be enough for the machine to under- stand the full semantics of the material and pro- 3.3.3. Automatic Annotation cess the information. For example, it means to Automatic annotation is a high level of semantic an- convey a book, hypothesis, methodology, arti- notation. Systems falling into this category are highly cle, conclusion, or any other source. trained and have high accuracy. To train this type of system, a large amount of quality data and rule sets are 3.3.2. Semi-automatic annotation required. To deal with these issues, unsupervised sys- In a semi-automatic semantic annotation, the frame- tems tried the many methodologies and experiment to work creates an annotation and these few are then learn how to annotate data without human oversight, post-edited and amended by human annotators [32]. but precision is as yet restricted. The automatic mean- Many manual annotation tools transferred to the semi- ing of lexical data allows both annotations to add im- automatic framework by providing manual training. portant information to the production search and in- Researches on semantic annotation methods investi- dex the document [16]. Article [26, 24] proposed a sci- gate the benefits of a state-of-the-art tools for semi- entific classification for information extraction tools automatic to help the semantic annotation of a large dependent on the principle strategy adopted on a larger set of biomedical queries [16]. There are numerous scale by the community. Some other techniques use semi automatic semantic frameworks, MnM [33]. Un- machine learning methods [22] to automate the se- like manual and automatic ones, don’t consolidate pro- mantic annotation using some training data. grammed into the semantic investigation, however, ei- Automatic semantic annotation is controlled by a ther use them as an extension between models ele- machine, so this annotation is efficient and is fast as tem flexible. The degree of annotation defines the clas- sification of annotation based on the input structure of data as shown in Figure 1. 3.4.1. Text: Most of the information on the web is in the form of Figure 7: Automatic annotation process of documents withtext. Data is extracted from the web directory through semantic data a user query. This user query is also written in form of text only. The input query could be mapped on struc- tured, semi-structured, or unstructured data. Annota- compared to manual annotation. The main key fea- tion of a text document is important to search, analyze, ture of an automatic semantic annotation is that it can and classify the documents correctly. handle massive data, which is the limitation of man- ual annotation. In automatic annotation, absolute rule 3.4.2. Image: or standard schema must be defined to work machines efficiently. Based on fascinating predefined standards, Increasing digital capturing techniques have led to a the automatic annotation performs the task. Automatic fantastic evaluation of images on the web. A text query annotation is useful for dynamic web content that may is used to access a huge amount of image data sources. be transient. Automatic annotation entirely depends To achieve this, a query is written which produces a upon the training module and failed to adopt new ter- visually similar description to the image. This feature minology. However, the complete automatic semantic of the image becomes a key to represent it. Several an- annotation for global data still is an unsolved prob- notation techniques have been used to make and de- lem. Hence, semi-automatic annotation methods are scribe the main feature of the image. Some of the re- being used widely in current scenarios. The compo- searchers have focused only on the feature extraction nent of the automatic semantic annotation is shown method and have developed an image semantic anno- in Figure 7, in which no interaction of humans at the tation method based on an image concept distribution running state. model. 3.4. Degree of Semantic Annotation 3.4.3. Audio: With the development of semantic annotation in the Universal mobile interface makes digital cities portable most recent couple of years, the semantic annotations with audio Earth annotations. The best example is can be applied in various spaces to extend convenience. cities carriable with audio semantic annotation and that In the absence of the structure of web data, automatic aimed to provide a comprehensive mobile interface to discovery of targeted or unexpected knowledge actu- the mobile user on demand. ally develops various research issues outlined in [22]. Heterogeneous data could be text, picture, sound, video, 3.4.4. Video: illustrations. The authors in [2] applied semantic an- Video annotations are equally important as image and notation textual objects and provide the practical im- audio on the web. Video lectures, social media con- pact of semantic annotation on the search. And in [12] tent, news video, sports, etc. are the data that is moni- applied semantic annotation on image objects to im- tored by semantic annotation. In the semantic context prove the searching and indexing. On the other hand, of the examined domain, the concept, instances, with [15] gave a procedure to add an explanation to XML their visual descriptors, enrich the video semantic an- compositions. To the best of our knowledge, no such notation. annotation technique exists that can be successfully applied to all content (text, image, audio, video) simul- 3.4.5. Hybrid: taneously. To keep in mind that diverse strategies are used for different content, we can be classifying the Multimedia content base semantic annotation is more annotation as a degree of annotation to use the com- challenging and based on high-level ontologies. These mon framework of the semantic web. Semantic anno- approaches are on demand. tators take input in a variety of forms, which is known as the degree of semantic annotation. It makes the sys- 4. Approaches for Semantic tem. Annotation 4.1.1. Supervised Machine Learning Several approaches have been proposed to explain the In a supervised learning method, an expert assigns the need for semantic annotations in user information and key to annotating data. To deal with annotating data, vast knowledge spaces. Some of the strategies focused several supervised machine learning models such as on semantic tagging (i.e., title, author, explanations, SVM, Hidden Markov Model (HMM), Markov Random etc.) in the document to annotate. It has reduced large Field Model to be implemented to optimize labeling volume of search that need to find supplementary in- costs. In this approach, firstly a pair of entities are formation in external sources [34, 35]. It categorizes mapped with the web as a corpus, then it finds a bi- the annotation tools according to the media content, nary relationship between the entities and if a relation which can be annotated by annotators (for example, is found, then it labels as a favorable otherwise marked text, audio, video, images, etc.). Furthermore, in view- as unfavorable. point to know how to achieve semantic annotation, Furthermore, some authors have extended an exist- there are various approaches and techniques used to ing approach with the help of the SVM machine learn- achieve annotation. [22] investigate the machine learn- ing technique but the main drawback of this method is ing approach to automate the annotation process. Au- that it cannot handle the multiple instances of learn- tomatic semantic annotation is more effectively fin- ing and during process, many bugs are found. Other ished nowadays, utilizing machine learning techniques. challenges in semi-supervised and unsupervised tech- We can further categorize the semantic annotation into niques to retrieve relation between the entities are dis- various automatic approaches, including Supervised cussed in [44]. machine learning based method, Unsupervised machine • Limitations of supervised machine learning ap- learning based method, Rule based methods, and On- proaches tology based Machine Learning. Supervised approach is completed in two stages, training and annotation. In the training provide the plain text with some labeled • Large Training Corpus: The efficient machine and in the annotation, the machine has to recognized learning model requires significant expert anno- entity and semantic relation based on the training la- tated corpus for training purpose and which are beled data. [12] apply supervised machine learning very expensive to develop. techniques to annotate image data. In an unsupervised approach, make an annotation with unlabeled data. • Limited Entities Extraction: This machine learn- For instance, [12] proposed a strategy for automati- ing models have only identified entities on which cally summing up the extraction designs from the web- models were trained. Other remaining categories site pages. The ontological annotation approach uti- of entities which are not recognized generate a lizes other information sources like Wikipedia, Vocab- false result, which affects the accuracy of the ulary, thesaurus ontology, etc. Rule-based semantic model. annotation is based on some pre-defined rules. Rule- • Lack of entity relation: Due to large data cor- based algorithms for semantic annotation, various ex- pus, it only explores the surface of the graph for traction frameworks have been created based on the every instance of knowledgebase. strategy, for instance: Crystal [36], AutoSlog [37], MnM [33], Rapier [38], SRV [39], Whisk [40], Stalker [41], Supervised machine learning methods are expensive and BWI[42]. The rule-based approach [43] is only and require a lot of effort. So, most of the research applicable if the streaming pattern is well known. It is has moved towards unsupervised or semi-supervised difficult to apply to the heterogeneous unknown struc- machine learning methods. These methods have been ture. discussed in the next section. 4.1. Machine Learning Methods 4.1.2. Unsupervised Machine Learning The dynamic environment and a wide range of domain Unsupervised machine learning is the process of au- influence the system to perform automatic annotation. tomatically identifying possible relationships between The automatic annotation process is one of the critical objects of massive text corpora. Unsupervised machine and challenging tasks for a semantic annotation sys- learning methods do not require manually labeled data. Pairing deep learning with unsupervised learning crosses the boundaries of supervised learning. This machine cannot be applied to other types of data or unstruc- learning method clusters similar entities concepts. These tured data. In this type of annotations, experts write clusters are commonly used to describe relationships some rules with the help of logical arguments, so that of sets that occur in such a way that the elements of the relationship can be extracted by carefully observ- sets refer to the same group. Researches examine some ing the correct logic. The rules follow some specific clustering techniques with some of the novel approaches IF-THEN-ELSE formats that elicit information from a discussed in [45]. They have created a simplified and high-level reference using a low-level reference. Ac- generalized grammatical clause representation that uti- cording to our survey of the literature, rules have been lizes information-based clustering and inter-sentence applied when it combines ontological reasoning [21]. dependencies to extract high-level semantic relations. Author [47] have provided a minimal rule engine, MiRE, [46] discovered and enhanced concept specific rela- for a context-aware mobile device. The rule is signifi- tions other than global connections by web mining. cant and can be applied in various tasks like event de- tection, IoT data representation. • Limitations of unsupervised machine learn- ing technique • Limitations of rule based approach • Due to automatic nature, sometimes it generates • It is applicable only to recognize regular pattern. unnecessary clusters that were not an area of in- • Dynamic changes cannot be easily handled by terest. this approach. • The output is less accurate because one input • Need expert to generate a rule with complete do- data is not known, and the data expert does not main knowledge. label dynamically. • Need large and complex rule to deal with un- • It does not extract the hidden relationship be- known vast data set. tween the entities and does not provide the link to relation. 4.3. Ontology-Based Methods 4.1.3. Deep Learning Method Ontology-based, dictionary-based, or knowledge-based semantic annotation is the most robust annotation ap- Due to large interlinked datasets on the internet, ma- proach to represent a relationship between data ob- chine learning aims to provide a method that processes jects. Ontology-based semantic annotations can be ap- data automatically. The idea could be achieved in the plied with any automation category (manual, semi- present text using deep learning semantic annotation automatic and automatic annotations). As we have based on public and common ontologies. Due to the discussed in Section 5, this annotation approach in- gradual growth and the large size of the resources, there troduces the process of generating metadata using on- is a need to have an active and quick semantic an- tology as their knowledge base. The ontology-based notation of resources. For example, Neural Network, approach relies entirely on description logic, which re- CBOW, and Skip-gram have become the state-of-the- lates to a family of logic-based knowledge representa- art for generating word embedding. The authors [21] tions of formalism. All ontological reasoning approaches have presented a deep learning and rule-based learn- have been supported by two general illustrations of ing technique for the Arabic language which involves semantic web languages. i.e., RDF (S) [48] and OWL discovering a document and used to enhance the se- [49, 50]. mantic indexing. Several frameworks support manual annotation, for example, Protégé-2000, CREAM , SMORE, Artequakt 4.2. Rule-Based Annotation Methods are the semantic annotation framework that supports various semantic annotation task (like create an anno- Rule-based annotation is the simplest and most straight- tation, add a tag, validate, etc.). Knowledgebase tools forward approach, which depends upon a predefined help to manage and store complex information. Some rule created by one or more experts. The rule base annotation tools have been used to develop and main- annotation can be applied only when either the data is tain the dictionary of the document. ERASMUS and fully known or have some specific notation. For exam- SIBM (CISMeF), NCBO Annotator, are some concepts ple, the rule base annotation is perfect for structured Mapper used to map the concept of a word to the in- datasets such as RDBMS data. Rule base annotations stance of the dictionary. Many semantic query languages (such as Triple, RQL, 6. Advantages and Applications SPARQL, RDQL, etc.) and various reasoning engines (RACER, Pellet, and FACT, etc.) connect the semantic of Semantic Annotation web languages. Some techniques such as the SWRL The advantages of annotation include searching, stor- rule provide popularity to ontological reasoning. On- ing, analyzing, and automation. In this section, we tological modeling represents the knowledge in a hi- shall discuss the various benefits of semantic annota- erarchical form and establishes the link between the tion and its real-life application. related entities. • Limitations of Ontological approach 6.1. Benefits of semantic annotation The semantic annotation helps to formulate logic for • Ontological modeling is domain specific. a more profound understanding by the machine. Se- • Expert knowledge is required to genereate a query. mantic annotation is encouraging the researcher to make inferences and draw conclusions about web resources. • Ontology-based query engine required to retrieve Some of the benefits of semantic annotation are given information. below. 5. Semantic Annotation Tools 6.1.1. Improves searching: Searching the vast and distributed structure of the web We can arrange annotation tools in a two-dimensional requires efficient search schemes. Searching becomes space, Ontology Support Semantic Annotation tools efficient when the available information is meaning- and Non-Ontology Support Semantic Annotation tools. ful and contains meta-data to support the information Describing these tools based on the various aspects of available on the internet. The semantic search will be semantic annotation. defined as a search that is based on semantics rather than just depending on text similarity[51]. Semantic 5.1. Non-Ontology Support Semantic annotations are also used to correlate significant tags Annotation Tools among reports to perform a semantic search. We are highlighting the most frequently referred non- 6.1.2. Better utilizes the available web ontology-based tools found in the literature study of resources: current semantic annotation. These tools annotate man- ually and some use different strategies to reduce the ef- Now a days, when almost everything is well defined, fort of annotating. Some tools have the option to per- organized, and adequately classified on the Web, then form annotation manually as well as automatically and the resources can be efficiently utilized. The informa- some have option both (semi-automatically). Some im- tion is available on the Web in various forms such as portant semantic annotation tools are shown in Ta- document, knowledge base and dictionary, etc. con- ble 2. tains information in the form of text or image or both can be linked appropriately through annotation. The 5.2. Ontology Support Semantic semantic annotations of web resources are connected concepts with meaningful representation in which the Annotation Tools retrieved information could be utilized according to Current semantic annotations, based on the literature, user interest instead of just a text matching. aim to support the development of inter language re- sources. Many researchers are working in this area 6.1.3. Improves the decision making: and several authors have contributed in multiple ways to make it successful. They have defined semantic an- It has been found that when all the related and signif- notations in a different appearance but have the same icant data has been shared with the clients (through semantics. Some ontology-based semantic annotation semantic look), at that point, the client is capable of tools and their aspects are shown in Table 3. making a few choices and can perform it successfully since he/she will be mindful of all the things. The se- mantic search will be helped by semantic annotation Table 2 Non-Ontology Support Semantic Annotation Tools SA System Approach Description Application domain Automation BroMo Unsupervised Using clustering for Proteins (biomedical) Semi-automatic blogs and article se- mantic annotation Sozekamm Supervised Annotate data using a Gemeral Semi-Automatic supervised categorical clustering algorithm LIMBO Ontea Unsupervised Process email or text Text and Email Manual/ Automatic document find the pat- tern Doccano Unsupervised Open source text anno- General Manual tation tool for human Yawas Unsupervised Java based web-based General Manual annotation system Briefing Associate Unsupervised Used for Microsoft MS Power point Manual power point presenta- tion Zemanta Rule based Algorithm for natural General Semi-Automatic language and seman- tic processing is propri- etary Thresher Unsupervised Aimed to Web pages Web page Automatic/Manual with similar content RCSSAT Supervised Classify the using a General Manual new lexicon since the semantic search is concerned with the mean- 6.2. Applications of semantic ing of the substance accessible. The semantic annota- annotation tor has given a semantic search with proper explana- tions to empower it to make an appropriate sense in After specifying the structure model of semantic anno- the document, picture, etc. tations, annotation creators can apply the annotation to serve their purpose such as (search, sharing, inte- gration, reuse, etc.). Here, in this section, several ap- 6.1.4. Unambiguous description of plications of semantic annotation are listed with some abbreviations: real-life applications. Many words/concepts have been expressed using the same abbreviation. This leads to a critical problem of 6.2.1. Bibliographies: ambiguity. The use of annotation is an effective way to troubleshoot this problem. Semantic annotation plays a vital role in the field of bibliography annotation to describe the source. The whole information of the source is essential for the 6.1.5. Automatically classifies the web authors while writing a paper. Bibliography seman- resources: tic annotation helps in linguistic data to analyze and If the resources available on the web are annotated is used for any language data. properly, then the classification process will be unin- terrupted because all classification algorithms only ask 6.2.2. Extraction of open information: for the references of annotated metadata to classify the resources. This makes semantic web search efficient as Semantic annotation has been practiced in diverse fields a process of classification. of knowledge. For instance, It has an application in a news analysis for the naming of places, organizations, and people. it has application in biological systems for the identification of biomedical entities such as genes, References proteins, and their relationships. [1] F. Pech, A. Martinez, H. Estrada, Y. Hernandez, 6.2.3. Alignment of ontologies: Semantic annotation of unstructured documents using concepts similarity, Scientific Program- This is one of the important applications for the align- ming 2017 (2017). ment of ontologies for knowledge management. On- [2] H. Agt, G. Bauhoff, R.-D. Kutsche, N. Milanovic, tology alignment is quite useful to differentiate the het- J. Widiker, Semantic annotation and conflict erogeneous models and it relates the difference to de- analysis for information system integration, Pro- termine various interoperability concerns that synchro- ceedings of the MDTPI at ECMFA 2010 (2010). nize in semantic image annotation and retrieval. [3] S. Albukhitan, A. Alnazer, T. Helmy, Semantic annotation of arabic web documents using deep 6.2.4. Semantic search: learning, Procedia computer science 130 (2018) 589–596. Search engines can retrieve the required documents [4] L. Stork, A. Weber, E. G. Miracle, F. Verbeek, more accurately with the help of metadata informa- A. Plaat, J. van den Herik, K. Wolstencroft, Se- tion. Scientists and librarians put lots of efforts and mantic annotation of natural history collection, time to create metadata for the documents. However, Journal of Web Semantics 59 (2019) 100462. to alleviate the hard labor, many attempts have been [5] F. Rahman, J. Siddiqi, Semantic annotation of dig- made towards generating the automatic metadata, based ital music, Journal of Computer and System Sci- on the techniques of information extraction. ences 78 (2012) 1219–1231. [6] C. Soanes, Oxford dictionary of English, Oxford 6.2.5. Classification: University Press, 2005. Semantic annotation helps to classify the data which [7] E. Oren, K. Möller, S. Scerri, S. Handschuh, speeds up the task and makes data secure. The infor- M. Sintek, What are semantic annotations, Re- mation retrieval-based system on semantic annotation latório técnico. DERI Galway 9 (2006) 62. helps to manage the data according to the search in- [8] V. Batanović, D. Bojić, Using part-of-speech tags terest of the user. as deep-syntax indicators in determining short- text semantic similarity, Computer Science and Information Systems 12 (2015) 1–31. 7. Conclusion [9] K. Bontcheva, H. Cunningham, Semantic annota- tions and retrieval: Manual, semiautomatic, and The purpose of this paper is to distinguish integrated automatic generation, in: Handbook of semantic review methodology from other review methods and web technologies, 2011. to propose research questions for integrated review [10] W.-f. WANG, L. ZHAO, Research and application methodology to increase the rigor of the process. In of ontology on semantic web [j], Journal of Za- this paper, we have presented an extensive study of ozhuang University 2 (2007). important approaches used for semantic annotation of [11] M. Kogalovskii, Semantic annotating of text a text document and a wide variety of approaches to documents: Basic concepts and taxonomic ap- explore the prominent historical semantic annotation proach, Automatic Documentation and Mathe- models applicable for text document annotation. We matical Linguistics 52 (2018) 134–141. have also provided the various aspects of semantic an- [12] G. Carneiro, A. B. Chan, P. J. Moreno, N. Vascon- notations based on which annotation can be classified. celos, Supervised learning of semantic classes We have also highlighted the importance of semantic for image annotation and retrieval, IEEE trans- annotation in real-life practices. actions on pattern analysis and machine intelli- The comparison of semantic annotation tools has gence 29 (2007) 394–410. been done through a level of automation, degree of an- [13] S. Prasad, A. K. Lodhi, S. Jain, Helpi viz: A seman- notation, and type of annotation. As a future scope, we tic image annotation and visualization platform are also trying to implement the semantic annotation for visually impaired, in: International Confer- models which will map with the global corpus and can ence On Computational Vision and Bio Inspired be applied to any domain. Finally, this approach also Computing, Springer, 2018, pp. 881–888. has been compared relatively with other ones available [14] S. Jain, S. Prasad, A. K. Lodhi, Semantic annota- in the literature. tion of images with text and sound for visually impaired, Journal of Open Source Developments in bioinformatics 22 (2021) 146–163. 5 (2018) 20–27. [27] M. Tallis, Semantic word processing for con- [15] H. N. Talantikite, D. Aissani, N. Boudjlida, Se- tent authors, in: Proceedings of the Knowl- mantic annotations for web services discovery edge Markup & Semantic Annotation Workshop, and composition, Computer Standards & Inter- Florida, USA, 2003. faces 31 (2009) 1108–1117. [28] P. Andrews, I. Zaihrayeu, J. Pane, A classification [16] A. Névéol, R. I. Doğan, Z. Lu, Semi-automatic of semantic annotation systems, Semantic Web 3 semantic annotation of pubmed queries: a study (2012) 223–248. on quality, efficiency, satisfaction, Journal of [29] P. Ogren, Knowtator: a protégé plug-in for anno- biomedical informatics 44 (2011) 310–318. tated corpus construction, in: Proceedings of the [17] S. Balakrishna, M. Thirumaran, V. K. Solanki, Iot Human Language Technology Conference of the sensor data integration in healthcare using se- NAACL, Companion Volume: Demonstrations, mantics and machine learning approaches, in: 2006, pp. 273–275. A Handbook of Internet of Things in Biomedical [30] A. Kalyanpur, J. Hendler, B. Parsia, J. Golbeck, and Cyber Physical System, Springer, 2020, pp. SMORE-semantic markup, ontology, and RDF 275–300. editor, Technical Report, Maryland Univ College [18] S. Jabbar, F. Ullah, S. Khalid, M. Khan, K. Han, Se- Park Dept of Computer Science, 2006. mantic interoperability in heterogeneous iot in- [31] K. Petridis, D. Anastasopoulos, C. Saathoff, frastructure for healthcare, Wireless Communi- N. Timmermann, Y. Kompatsiaris, S. Staab, M- cations and Mobile Computing 2017 (2017). ontomat-annotizer: Image annotation linking [19] B. Popov, A. Kiryakov, A. Kirilov, D. Manov, ontologies and multimedia low-level features, in: D. Ognyanoff, M. Goranov, Kim–semantic anno- International Conference on Knowledge-Based tation platform, in: International Semantic Web and Intelligent Information and Engineering Sys- Conference, Springer, 2003, pp. 834–849. tems, Springer, 2006, pp. 633–640. [20] N. Kiyavitskaya, N. Zeni, L. Mich, J. R. Cordy, [32] M. Erdmann, A. Maedche, H.-P. Schnurr, S. Staab, J. Mylopoulos, Text mining through semi au- From manual to semi-automatic semantic anno- tomatic semantic annotation, in: International tation: About ontology-based text annotation Conference on Practical Aspects of Knowledge tools, in: Proceedings of the COLING-2000 Management, Springer, 2006, pp. 143–154. Workshop on Semantic Annotation and Intelli- [21] C. Lhioui, A. Zouaghi, M. Zrigui, A rule-based gent Content, 2000, pp. 79–85. semantic frame annotation of arabic speech turns [33] M. Vargas-Vera, E. Motta, J. Domingue, M. Lan- for automatic dialogue analysis, Procedia Com- zoni, A. Stutt, F. Ciravegna, Mnm: Ontology puter Science 117 (2017) 46–54. driven semi-automatic and automatic support for [22] J. Tang, D. Zhang, L. Yao, Y. Li, Automatic seman- semantic markup, in: International Conference tic annotation using machine learning, in: Ma- on Knowledge Engineering and Knowledge Man- chine Learning: Concepts, Methodologies, Tools agement, Springer, 2002, pp. 379–391. and Applications, IGI Global, 2012, pp. 535–578. [34] R. Schroeter, J. Hunter, D. Kosovic, Filmed- [23] H. Hassanzadeh, M. Keyvanpour, A machine collaborative video indexing, annotation, and learning based analytical framework for seman- discussion tools over broadband networks, in: tic annotation requirements, arXiv preprint 10th International Multimedia Modelling Con- arXiv:1104.4950 (2011). ference, 2004. Proceedings., IEEE, 2004, pp. 346– [24] S. Mesbah, K. Fragkeskos, C. Lofi, A. Bozzon, G.- 353. J. Houben, Semantic annotation of data process- [35] S. Jain, Understanding Semantics-Based Decision ing pipelines in scientific publications, in: Eu- Support, CRC Press, 2020. ropean semantic web conference, Springer, 2017, [36] S. Soderland, D. Fisher, J. Aseltine, W. Lehnert, pp. 321–336. Crystal: Inducing a conceptual dictionary, arXiv [25] V. Uren, P. Cimiano, J. Iria, S. Handschuh, preprint cmp-lg/9505020 (1995). M. Vargas-Vera, E. Motta, F. Ciravegna, Seman- [37] E. Riloff, et al., Automatically constructing a tic annotation for knowledge management: Re- dictionary for information extraction tasks, in: quirements and a survey of the state of the art, AAAI, volume 1, Citeseer, 1993, pp. 2–1. Journal of Web Semantics 4 (2006) 14–28. [38] M. E. Cali, Relational learning techniques for [26] M. Neves, J. Ševa, An extensive review of tools natural language information extraction, Report for manual annotation of documents, Briefings AI98276 (1998). [39] D. Freitag, Information extraction from html: [46] R. Bunescu, R. Mooney, Learning to extract Application of a general machine learning ap- relations from the web using minimal supervi- proach, in: AAAI/IAAI, 1998, pp. 517–523. sion, in: Proceedings of the 45th Annual Meeting [40] S. Soderland, Learning information extraction of the Association of Computational Linguistics, rules for semi-structured and free text, Machine 2007, pp. 576–583. learning 34 (1999) 233–272. [47] C. Choi, I. Park, S. J. Hyun, D. Lee, D. H. Sim, [41] I. Muslea, S. Minton, C. Knoblock, Stalker: Learn- Mire: A minimal rule engine for context-aware ing extraction rules for semistructured, web- mobile devices, in: 2008 Third International based information sources, in: Proceedings of Conference on Digital Information Management, AAAI-98 Workshop on AI and Information Inte- IEEE, 2008, pp. 172–177. gration, AAAI Press, 1998, pp. 74–81. [48] B. McBride, The resource description framework [42] D. Freitag, N. Kushmerick, Boosted wrapper in- (rdf) and its vocabulary description language duction, AAAI/IAAI 583 (2000). rdfs, in: Handbook on ontologies, Springer, 2004, [43] S. Jain, S. Sharma, J. M. Natterbrede, M. Hamada, pp. 51–65. Rule-based actionable intelligence for disaster [49] G. Antoniou, F. Van Harmelen, Web ontology situation management, International Journal of language: Owl, in: Handbook on ontologies, Knowledge and Systems Science (IJKSS) 11 (2020) Springer, 2004, pp. 67–92. 17–32. [50] S. Jain, C. Gupta, A. Bhardwaj, Research di- [44] B. Rosenfeld, R. Feldman, Using corpus statistics rections under the parasol of ontology based se- on entities to improve semi-supervised relation mantic web structure, in: International Confer- extraction from the web, in: Proceedings of the ence on Soft Computing and Pattern Recogni- 45th Annual Meeting of the Association of Com- tion, Springer, 2016, pp. 644–655. putational Linguistics, 2007, pp. 600–607. [51] N. Limbasiya, P. Agrawal, Semantic textual sim- [45] S. Brody, Clustering clauses for high-level re- ilarity and factorization machine model for re- lation detection: An information-theoretic ap- trieval of question-answering, in: International proach, in: Proceedings of the 45th Annual Meet- Conference on Advances in Computing and Data ing of the Association of Computational Linguis- Sciences, Springer, 2019, pp. 195–206. tics, 2007, pp. 448–455. Table 3 Ontology Support Semantic Annotation tools SA System Approach Description Application domain Automation AnnotEx Supervised Learning Annotating based on General Manual classifying documents by means of semantic similarities S-CREAM Supervised Learning Annotate Dynamic Domain Dependent Semi-Automatic web pages and track the activities using hyperlink NavEx Supervised learning Extends traditional Service Oriented Envi- Automatic performance-based ronments annotation Knowledge and Infor- Supervised Learning Keyword-based Inter-domain knowl- Manual mation Management edgebase (KIM) Armadillo Supervised learning Gene annotation sys- Gene (Biomedical) Automatic tem, Pattern Discovery CREAM Rule-Based/wrappers Framework for high General Automatic / Manual structure web page GoNTogle Unsupervised Annotation and search General Automatic facilities based on tex- tual similarity C-PANKOW Unsupervised Pattern based annota- General Automatic tion task BIMTag Unsupervised Semantic annotation of BIM product Automatic online BIM product re- sources OEAKM Unsupervised Built ontology enabled General Semi-automatic annotation KMS that provides clustering and real-time discussion for collaborative learning Melita Supervised Follow to two phase General Manual / Automatic cycle (Turning and scheduling text) based on the training and active learning OntoMat-Annotizer Unsupervised Web based annotation Image, Multimedia Automatic tool that is able to Manual create owl instance, attribute and relation- ship AeroDAML Rule based Scalable with diverse Webpage Semi-Automatically ontologies PARMENIDES Unsupervised create a domain ontol- General Automatic ogy using cluster MnM Unsupervised Learns extraction rules webpage Semi-Automatic from training corpus SemTag Rule based Performs structural General Automatic analysis