=Paper=
{{Paper
|id=Vol-1690/paper96
|storemode=property
|title=Harnessing Crowds and Experts for Semantic Annotation of the Qur'an
|pdfUrl=https://ceur-ws.org/Vol-1690/paper96.pdf
|volume=Vol-1690
|authors=Amna Basharat,Khaled Rasheed,I. Budak Arpinar
|dblpUrl=https://dblp.org/rec/conf/semweb/BasharatRA16
}}
==Harnessing Crowds and Experts for Semantic Annotation of the Qur'an==
Harnessing Crowds and Experts for Semantic Annotation of the Qur’an Amna Basharat, Khaled Rasheed, I. Budak Arpinar Department of Computer Science University of Georgia Athens, GA, 30602 USA amnabash@uga.edu,khaled@uga.edu,budak@uga.edu Abstract. In this paper we illustrate how we harness the power of crowds and specialized experts through automated knowledge acquisi- tion workflows for semantic annotation in specialized and knowledge in- tensive domains. We undertake the special case of the Arabic script of the Qur’an, a widely studied manuscript, and apply a hybrid methodol- ogy of traditional ’crowdsourcing’ augmented with ’expertsourcing’ for semantically annotating its verses. We demonstrate that our proposed hybrid method presents a promising approach for achieving reliable an- notations in an efficient and scalable manner, especially in cases where a high level of accuracy is required in knowledge intense and sensitive domains. Keywords: semantic annotation, disambiguation, classification, ontol- ogy, Qur’an 1 Introduction Thematic annotation of religious texts, in particular, the classical sources of knowledge in the Islamic domain in the Arabic language, has not received much attention, partly owing to time and knowledge constraints from experts required for such an annotation process. In our research, we consider the application of specialized human computation methods such as nichesourcing ([1], [2]) in an an attempt to scale this process of annotation. Nichesourcing or expertsourc- ing extends the idea of engaging skilled and knowledgeable persons in place of faceless crowds for human driven tasks. We employ nichesourcing as means of augmenting traditional crowdsourcing methods rather than as an alternate. In this paper, we show the results of an exploratory study focussing on two knowledge intensive tasks: the thematic disambiguation and annotation of Qur’anic verses using its Arabic script. While this has been tackled through a pure crowdsourcing approach in our earlier work in [3], we determined that several tasks are rather knowledge intensive and require domain expertise. In this case, not only the knowledge of Qur’anic Arabic is considered imperative, the annotation of the Arabic verses also requires understanding the context and content of the given verse. 2 A. Basharat 2 Hybrid Architecture for Harnessing Crowd and Expert Annotations We design and develop a hybrid workflow architecture that connects a crowd- sourcing framework with an expertsourcing application as shown in Figure 1. Fig. 1. Hybrid Architecture for Harnessing Crowd and Expert Annotations Crowdsourcing Stage: We design a task management engine that is responsi- ble for generating tasks, retrieving and aggregating results. The tasks are pub- lished on the Amazon Mechanical Turk (AMT)1 platform. A complete work- flow management system is implemented (a derivative of a workflow model for Linked Data Management presented in [4]), which includes means for generating dynamic tasks from a range of task profiles. The semantic annotation process is driven by an ontology schema. The task input is generated by retrieving relevant candidate verses from the available external data sources such as the Semantic- Quran [5] dataset. The AMT crowd performs the thematic disambiguation and thematic anno- tation tasks. Both tasks are based on the Arabic script of the Qur’an. For the disambiguation task, a question is presented to the crowd, which includes a verse, along with a highlighted, candidate explicit assertion for the given theme, and the crowd responds by declaring this assertion as either a positive or negative by determining if the occurrence is a true occurrence of the given theme. The an- notation tasks require deeper knowledge and understanding of the Arabic text. The crowd determines whether the given verse contains any implicit reference to the given theme. If their response is positive, then they are also required to provide the portion of the verse (a meaningful phrase or a word) that implies the presence of the theme. As a form of a quality measure, the crowd is also 1 http://www.mturk.com Harnessing Crowds and Experts for Semantic Annotation of the Qur’an 3 required to provide a confidence level (ranging from Very High to Very low), to indicate their confidence in their response. Decision Analytics: We collect and aggregate the responses based on statis- tical measures of aggregation. Weighted confidence measures and thresholds are applied. Based on this aggregation, the completed tasks are marked as either Approved or Reviewable. A high confidence and aggregation threshold is applied for the approved tasks. This decision analytics results in identifying the can- didate tasks for expertsourcing. The tasks marked as reviewable, which fail to meet the agreement thresholds, are sent off for expert annotations. Expertsourcing Stage: For this purpose we designed a custom web application to engage with experts. A RestAPI connects the crowdsourcing task management engine with the expertsourcing application. The tasks are sent to the remote ap- plication and experts are notified when the tasks become available. The experts also see the candidate responses collected from the crowdsourcing stage. The ex- perts have either the option to choose from the available annotations (collected during the crowdsourcing stage) or provide their own if they do not agree with either one. An example is shown in Fig 2. We present the same task to three experts to analyze the annotation agreements. The approved and validated an- notations are passed on for Ontology Population and linked with existing data sources. Fig. 2. Task Design for Thematic Annotation of Qur’anic verses 3 Results and Discussion The experimental setup assigned each task to 5 crowd workers. For the reviewable crowd tasks that were sent to experts, 3 experts were assigned to each reviewable task. Table 1 shows the results obtained. 4 A. Basharat Crowd Tasks Expert Tasks Task Disambiguation Annotation Disambiguation Annotation Approved 1267 477 34 96 Reviewable 40 107 6 11 Total 1307 584 40 107 Table 1. Results for Disambiguation and Annotation Tasks The results of our exploratory study provide interesting insights into the application of human computation methods to knowledge intensive tasks. Our task design involved the thematic disambiguation and annotation of the Qur’anic verses based on the original Arabic script. For the disambiguation task, 99% of the tasks were able to reach an agreement by combining contributions of crowds and experts. Only about 3% tasks needed expert contributions. For the annota- tion tasks, about 18% tasks needed expert contributions. There were 10% tasks that did not reach an agreement with both crowd and expert contributions. An administrative review of these cases indicate that some annotations are a matter of personal taste and judgement and closed agreement is therefore difficult. Most of these annotations cannot be classified as wrong, nor better than the others based on an automated agreement mechanism. Our knowledge acquisition and review workflow to selectively elicit expert annotations only where needed indeed presents a promising method. We utilize annotation agreement and distance analytics to route the appropriate tasks that need expert contributions. Our results suggest that such a hybrid approach in- deed creates for a more accurate and reliable annotation process. This method can be effectively utilized for qualitative dataset management and semantic an- notation tasks in an economic and feasible manner through crowd engagement, while reducing the need for expert contributions. References 1. De Boer, V., Hildebrand, M., Aroyo, L., De Leenheer, P., Dijkshoorn, C., Tesfa, B., Schreiber, G.: Nichesourcing: Harnessing the power of crowds of experts. In: Knowledge Engineering and Knowledge Management. Springer (2012) 16–20 2. Oosterman, J., Bozzon, A., Houben, G.J.e.a.: Crowd vs. experts: nichesourcing for knowledge intensive tasks in cultural heritage, Int. WWW Conferences Steering Committee (2014) 567–568 3. Basharat, A., Arpinar, I.B., Rasheed, K.: Leveraging crowdsourcing for the thematic annotation of the qur’an. In: Proceedings of the 25th International Conference Com- panion on World Wide Web, International World Wide Web Conferences Steering Committee (2016) 13–14 4. Basharat, A., Arpinar, I.B., Dastgheib, S., Kursuncu, U., Kochut, K., Dogdu, E.: Semantically enriched task and workflow automation in crowdsourcing for linked data management. International Journal of Semantic Computing 8(04) (2014) 415– 439 5. Sherif, M.A., Ngomo, A.C.N.: Semantic Quran - a multilingual resource for natural- language processing. Semantic Web 6(4) (2015) 339–345