=Paper=
{{Paper
|id=Vol-2535/paper_2
|storemode=property
|title=Fascinating with Open Data: openArtBrowser
|pdfUrl=https://ceur-ws.org/Vol-2535/paper_2.pdf
|volume=Vol-2535
|authors=Bernhard Humm
|dblpUrl=https://dblp.org/rec/conf/qurator/Humm20
}}
==Fascinating with Open Data: openArtBrowser==
Fascinating with Open Data: openArtBrowser* Bernhard G. Humm[0000-0001-7805-1981] Hochschule Darmstadt – University of Applied Sciences Haardtring 100, 64295 Darmstadt, Germany bernhard.humm@h-da.de Abstract. This in-use-paper presents openArtBrowser, a Web application for ed- ucating in visual art, fascinating users for paintings, drawings and sculptures. OpenArtBrowser is solely based on linked open data and its code is open source. It fosters serendipity by supporting users to discover new aspects of art out of curiosity, without actively searching. The user interaction concept and software architecture is explained and discussed. Keywords: Linked open data, semantic search, serendipity. 1 Introduction In this in-use-paper we present openArtBrowser1, a Web application for educating in visual art, fascinating users for paintings, drawings and sculptures. OpenArtBrowser is solely based on linked open data, namely from Wikidata2, and Wikimedia Commons3, and its source code is open source4. This work has partly been inspired by our earlier works in digital heritage and GLAM (galleries, libraries, archives and museums) [1-6] which resulted, amongst others, in the digital collection of Städel Museum Frankfurt5, one of the most prominent art mu- seums in Germany. OpenArtBrowser serves two use cases: 1. Active search: The user wants to retrieve specific information about visual art, e.g., details about a painting. 2. Browsing: The user has no need for specific information but wants to be inspired by art, e.g., looking at artworks that fascinate him or her and learn interesting aspects. * Copyright © 2020 for this paper by its author. Use permitted under Creative Commons Li- cense Attribution 4.0 International (CC BY 4.0). 1 openartbrowser.org 2 www.wikidata.org 3 commons.wikimedia.org 4 github.com/hochschule-darmstadt/openartbrowser 5 sammlung.staedelmuseum.de 2 With openArtBrowser, we pursue the following goals: 1. Learning with fun: The user shall learn new aspects of visual art in a joyful and playful manner. 2. Open Data: All data about visual art shall be from open sources. 3. Serendipity: The user shall discover new aspects of art out of curiosity, without ac- tively searching for it. Also regular users shall be surprised by unexpected findings over and over again. 4. Usability: The application shall be easy to use, i.e., be simple, consistent and self- explanatory, shall focus on relevant information, have low response time etc. 5. Aesthetics: Corresponding to the domain of visual art, the application shall convey an aesthetic appearance. 6. Responsive Design: The application shall be usable on various devices, including desktop computers, tablet computers, and smartphones. 2 User Interaction Concept In this section, we introduce the user interaction concept of openArtBrowser by pre- senting some of the pages. 2.1 Homepage The homepage of openArtBrowser provides two prominent elements: 1. A search bar allows actively searching for visual arts. Animated suggestions like “Try Mona Lisa” invite to start searching. 2. Seven tiles with search facets show suggestions for concrete artworks, artists, movements, locations, materials, genres, and motifs. The suggestions are animated and change from time to time. The home page changes appearance each time the page is visited or re-loaded. See Fig. 1. 3 Fig. 1. Different appearances of the openArtBrower homepage 2.2 Facets The following search facets are provided: 1. Artwork: Concrete paintings, drawings, or sculptures, e.g., “Prayer before the meal” by Quirijn van Brekelenkam 2. Artist: painter or sculptor, e.g., Leonardo da Vinci 3. Movement: Artistic movement, e.g., land art 4. Location: Places where artworks are exhibited, e.g., Luxembourg Museum 5. Material: Materials that were used in artworks, e.g., plywood 6. Genre: Artistic genres, e.g., Cycladic art 7. Motif: Things or aspects depicted in an artwork, e.g., suicide. Each facet has a unique icon which is used consistently throughout the application. See Fig. 2. Fig. 2. Search facets on the homepage 4 For each facet there are pages with interesting details. For example, on artist pages the birth dates and death dates and of artists, their citizenship, artistic movements etc. are displayed. In the lower part of the pages, artworks are shown which match the facet. E.g., on the page for the motif “mother”, depictions of mothers are shown. Sliders allow to show more artworks which do not fit on one page. An animation moves the sliders from time to time. See Fig. 3. Fig. 3. Examples of artist page (Leonardo da Vinci) and motif page (mother) 2.3 Artworks The focus of openArtBrowser is on artworks. When selecting an artwork in a facet page, then the respective artwork page is opened. See Fig. 4 for an example. Depicted is the upper part of the artwork page. The lower part (related artworks, Fig. 5) is de- scribed below. The upper part of the artwork page contains a depiction of the artwork itself and metadata. When expanding the image in full-screen mode, a high resolution image is loaded which can be zoomed to inspect details. The metadata contains details about the artist, the location, and the inception of the artwork. To focus on the painting itself, more metadata are hidden when opening an artwork page but can be displayed with one click as shown in Fig. 4. 5 Fig. 4. Example of an artwork page (Virgin of the Rocks by Leonardo da Vinci), upper part 2.4 Related Artworks At the bottom of an artwork page, related artworks are depicted. Artworks are con- sidered related if they share at least on motif, artist, location, genre, movement or ma- terial. See Fig. 5. In different tabs, related artworks for one specific facet can be displayed. The tab “All” which is activated when opening an artwork page combines all facets. When hov- ering over one of the related artworks (in Fig. 5 the artwork “Madonna and Sleeping Child with Three Angels”), then those tags are highlighted which are shared between both artworks. E.g., The artworks “Virgin of the Rocks” and “Madonna and Sleeping Child with Three Angels” both are oil paintings on canvas, belong to the genre religious art and depict virgin Mary, child Jesus and angels, and are both exhibited in Room 710 of the Louvre Museum in Paris. So, using this feature, different artworks can be com- pared. 6 Fig. 5. Related artworks (artwork page, lower part) 2.5 Semantic Search In all pages of the Web application, the search bar can be used. While the user is typing search terms, search suggestions are displayed, similarly to a Google search. Unlike the Google suggestions, the openArtBrowser suggestions are disambiguated semantically and assigned to the search facets. See Fig. 6. Fig. 6. Search bar with semantic autosuggest when typing “vi…” In the example of Fig. 6, the user is typing the letters “vi…”. Various artworks, artists, materials, genres and motifs are displayed which contain the letters “vi” (case- 7 insensitive), grouped according to their facet. The matching letters “vi” are highlighted (in green). A sophisticated heuristic ranking selects a limited number (here 10) of sug- gestions from a potentially very large number of matches. Ranking criteria include: 1. Ranking of artworks and facets: The ranking heuristics include: The more infor- mation available for an artwork the higher its rank; the more artworks exist for a facet the higher its rank. 2. Position of match: The ranking heuristics include: Matches at the beginning of the first word (e.g., “Virgin Mary”) are ranked higher than matches at the beginning of another word (e.g., “architectural view”), which are, again, ranked higher than matches within words (e.g., “David”). 3. Diversity: More different facets are favored over only one or very few facets, even if their ranking according to criteria (1) and (2) might be lower. The concepts of the semantic autosuggest feature are oriented at [7]. Whenever a user selects an autosuggested term, the respected page is opened imme- diately. However, the user may continue to refine the search by adding more search criteria. See Fig. 7. Fig. 7. Multiple search conditions 8 In the example of Fig. 7, the user has specified the search criteria “Rouen Cathedral” (motif) and “Claude Monet” (artist). If more than one search condition is entered, a search result page is opened that shows artworks which satisfy all search conditions (AND connected search); in the example all artworks by Claude Monet that depict Rouen Cathedral (25 hits on display). When the user enters a search string but does not select any suggestion, a full-text search in the labels of all entities is performed. A full-text search may also be refined by additional search criteria. 3 Linked Open Data Source All data displayed in openArtBrowser is open source. All metadata stems from Wiki- data6, a collaboratively edited knowledge base hosted by the Wikimedia Foundation which is used, e.g., by Wikipedia. All images are hosted at Wikimedia Commons7 under a Creative Commons License. Wikidata is a knowledge base (a.k.a. knowledge graph, ontology) that can be read and edited by both humans and machines. It stores topics, concepts or objects, their attributes and relationships. See Fig. 8 showing the first page of the Wikidata entry for Mona Lisa8. Fig. 8. Wikidata entry for Mona Lisa 6 www.wikidata.org 7 commons.wikimedia.org 8 www.wikidata.org/wiki/Q12418 9 Every Wikidata entry has a unique identifier (here Q12418) and a label in various languages. All information is stored as key-value-pairs where keys are pre-defined properties, e.g. creator (P170), and values are either references to other Wikidata items, e.g., Leonardo da Vinci (Q762), or literals like a concrete birth date. Wikidata is rich in information, both in terms of number of items and number of attributes for each item. The entry for Mona Lisa contains more than 200 attributes, including all links to Wikipedia pages about Mona Lisa. From Wikidata, we extract the following entities: 110,000 artworks, e.g., Mona Lisa 21,000 motifs, e.g., mountains 16,000 artists, e.g., Leonardo da Vinci 4,800 locations, e.g., Louvre Museum 520 materials, e.g., oil paint 250 genres, e.g., portrait 220 movements, e.g., Renaissance 3.1 Data Model For those entities, we selected attributes of interest. Fig. 9 shows the data model of openArtBrowser, i.e., all entities, their attributes and associations as a UML class dia- gram. The central entity is Artwork with subtypes Painting, Drawing and Sculpture. All entities have an id, a label and optionally a description. Different entities have different other optional attributes, e.g., height and width for an Artwork, date_of_birth and date_of_death for an Artist, latitude and longitude (lat, lon) for a Location, etc. The central entity Artwork is associated with the entities Material, Genre, Movement, Artist, Location, and Motif. 10 +influenced_by +influenced_by Artist +id Movement +label +id +description +part_of +label +image +description +gender Location +image +movements+date_of_birth +date_of_death +id +place_of_birth Genre +label +place_of_death +description +citizenship +id +image +label +country +description +website +image +lat +lon +creators +movements Artwork +locations Material +id +genres Motif +label +id +description +id +motifs +label +image +label +description +materials +inception +description +image +country +image +height +width Painting Sculpture Drawing Fig. 9. Data model 3.2 Data Modeling Considerations How did we select those entities and their attributes out of hundreds of entities and attributes provided in Wikidata? This was the result of a lengthy process inspecting the data provided. We used the following selection criteria. Requirements: Entities and attributes were selected which were deemed relevant for meeting the requirements of an art browser, i.e., information which is interesting to users. For example, information about artists and their backgrounds is certainly in- teresting. Also interesting is information about motifs depicted. Information we omitted were, e.g., the inventory number of artworks, the course of death of artists, the director / manager of museums, etc. Such attributes were not deemed relevant. Quality: We carefully checked the quality of data provided for certain attributes since the quality of crowdsourced data may vary considerably. For example, we observed the “instance of” and “subclass of” relationships of mo- tifs. These attributes allow to model hierarchies of broader and narrower terms. This could potentially be useful. Users of openArtBrowser could, e.g., search for a motif “animal” and find artworks which are tagged with motifs “dog”, “cow”, “horse” etc. 11 (but not explicitly tagged “animal”). However, we decided not to include this feature in openArtBrowser. The reason be- ing that, due to a lack of agreed modelling guidelines, “instance of” and “subclass of” relationships are used inconsistently, nearly arbitrarily by the Wikidata commu- nity. If we had blindly included hierarchical search according to the “instance of” and “subclass of” relationships, the search for motif “animal” would have also resulted in artworks that are tagged with motif “wife”. Why is this the case? This is because in Wikidata, the following relationship chain is modelled: wife is subclass of woman, is subclass of female, is subclass of Homo sapiens, is subclass of omnivore, is sub- class of animal9. Quantity: We also took into consideration, how frequently certain attributes are being tagged in Wikidata. For example, the Attribute “Iconclass notation” of artworks is relevant for expert users of openArtBrowser, since it indicates the iconography of artworks. The data quality is also good. Insofar, iconography would be a candidate for another facet. However, Iconclass notation is so rarely tagged so that it would be frustrating for a user to select this facet and then being able to navigate to only one or two other artworks. It is worth noting that all three criteria, requirements, quality and quantity, may vary over time. Therefore, adapting the data model should be considered regularly. 4 Software Architecture The software architecture of openArtBrowser consists of the online Web application and an offline batch. Fig. 10 gives an overview. Fig. 10. Software architecture of openArtBrowser 9 Accessed 4/3/2019 12 The online Web application is designed as a two-layer architecture with the presen- tation and application logic being implemented in HTML10 / CSS11 / TypeScript12 using the Angular13 Web application framework. The datastore is implemented using the search engine Elasticsearch14. Queries to the datastore are executed very fast within a few milliseconds. The offline batch implements a semantic ETL (extract, transform, load) process, which can be seen as the curation process of the arts data. In this process, relevant data is extracted from the knowledge base (Wikidata), is semantically enriched and trans- formed so that it can be loaded into the datastore (Elasticsearch). The offline batch process can be started regularly (e.g., weekly) and executes for more than 24 hours. This relatively poor performance is due to the response time of the Wikidata server. However, this does by no means affect the performance of the online Web application. Semantic ETL is implemented with the Python programming language 15. For ex- tracting relevant data from Wikidata, the Python framework Pywikibot 16 is used. Ex- traction is based on the data model as depicted in Fig. 9. Data cleansing is an important part of the extraction process. Since Wikidata attrib- utes are not statically typed and Wikidata is crowdsourced, the quality of entries varies considerably. During the extraction process, a syntactic quality check is performed and data entries which do not conform to the expected data types are omitted. For example, references from artworks to artists which are no proper Wikidata links are omitted. The same applies for inception dates which are no integer numbers. Extracted data is stored as JSON files. Semantic enrichment means adding value to the raw data. This includes computing custom ranks of artworks and all facets. The ranking criteria are described in Sec- tion 2.5. All ranks are normalized to values between 0 and 1 and get evenly distributed, so that the median always gets the rank 0.5. This enables comparing the ranks of art- works, artists, movements, genres, motifs, etc. Semantic enrichment also includes add- ing data from other sources, e.g., Wikipedia, Youtube, Iconclass, etc. Transformation means storing the arts data in a format required by the datastore (Elasticsearch). In this case, this is a JSON format which reflects the data model as depicted in Fig. 9. Associations between entities are represented as arrays of Wikidata IDs. Finally, loading is the step of updating the Elasticsearch index with newly extracted data. This is done using the Elasticsearch Update API which ensures continuous oper- ation of the Web application. 10 www.w3.org/html 11 www.w3.org/Style/CSS 12 www.typescriptlang.org 13 angular.io 14 www.elastic.co/de/products/elasticsearch 15 www.python.org 16 doc.wikimedia.org/pywikibot 13 5 Discussion We discuss openArtBrowser by comparing the implementation with the goals set out in the introduction. 1. Learning with fun: So far, no systematic user test has been performed and evaluated and insofar, there is no scientific proof yet that this goal has been reached. Instead, openArtBrower has been tested in an adh-oc manner by various user and age groups. We observed users having fun discovering unexpected aspects in artworks and fol- lowing tags, particularly motifs that were as yet unknown to them. 2. Open Data: All data displayed in openArtBrowser is open source from Wikidata and Wikimedia Commons. 3. Serendipity: There are various aspects of serendipity in openArtBrowser. The home page displays tiles with changing artworks, artists, movements, motifs etc. including their images. They invite users to follow interesting topics out of curiosity, without actively searching. In order to surprise also regular users over and over again, the home page and the tile contents change each time. All pages contain a gallery of artworks which attract attention. Those galleries are animated and slide from time to time (10 seconds). Additionally, artworks are shuf- fled on each use so that frequent users still can discover new aspects. The artwork page displays a gallery of related artworks where different relations are offered: motif, genre, movement, etc. Information about artworks are displayed as tags with hyperlinks. Clicking on those tags allows for discovering new aspects. Finally, also the semantic autosuggest feature fosters serendipity as various options for completing and refining the search terms are offered. However, the fact that search results are ranked and only the top-ranked entities are presented may lead, even with shuffling, to a filter bubble. This means that highly ranked entities get displayed regularly, but lowly ranked entities seldom or even never. 4. Usability: The application has been designed to be as simple as possible. Icons are used consistently in the semantic autosuggest, in tags, and in headlines of pages. Focus is on relevant information. E.g., on the artwork page, detail metadata is hidden when opening a page in order to avoid information overload. The response time is less than 1s for each page access, even when lots of information are displayed. Only when displaying images in full screen mode, high-resolution images are loaded which may take a little longer. Unser interaction follows common conventions, e.g., tags being clickable. Unusual interactions like comparing metadata of two artworks in the related artwork section are explained with a hint. An about page explains the use of the Web application in text form. 5. Aesthetics: We consider the Web application to be aesthetic and first users of open- ArtBrowser confirm this impression. 6. Responsive Design: openArtBrowser is responsive and can be used on various de- vices, including desktop computers, tablet computers, and smartphones. 14 So, we conclude that openArtBrowser meets the goals set out in Section 1. The openArtBrowser implementation could also be used to implement a custom de- ployment with your own selection of artists and artworks. For this, the GitHub project would have to be forked and a filter would have to be implemented in the semantic ETL process. Furthermore, the openArtBrowser concept and implementation could be used to im- plement semantic browsers for other application domains, e.g., movies, events, science, literature, history, politics, etc. For this, the GitHub project would have to be forked and the data model, the semantic ETL process, and the Web application would have to be adapted. 6 Conclusions and Future Work We have presented openArtBrowser, a Web application for educating in visual art, fas- cinating users for paintings, drawings and sculptures. OpenArtBrowser is solely based on linked open data and its source code is open source. OpenArtBrowser is actively being developed further. At the time of writing, the fol- lowing features are being implemented: 1. Multi-language support: The first implementation of openArtBrowser was in Eng- lish: dialog controls as well as metadata. We are currently implementing support for additional languages, namely German, French, Spanish, and Italian. 2. More data sources: Using semantic interlinking, openArtBrowser is enriched with additional data, e.g., Wikipedia abstracts. Care is being taken that this will not di- minish the focus on relevant information and will not result in information overload. 3. Multimedia: YouTube videos about artworks, artists and artistic movements are in- tegrated in openArtBrowser. 4. Analytics: User interactions are being logged in order to learn about the behavior of users and potentially improve user experience. Future work will elaborate on this. In particular, Wikidata’s identifier links to other knowledge graphs can be used to enrich data, e.g., WikiArt17, Europeana18, and Getty19. The filter bubble effect may be reduced, e.g., by regularly displaying some lowly ranked entities as well. Also, we could experiment with other ranking criteria like the Wikidata rank or usage statistics. In addition to adding features to openArtBrowser, we intend to perform a thorough user evaluation and expect to learn insights for further improving the Web application. Visit openartbrowser.org and discover the fascinating world of visual arts! 17 www.wikiart.org 18 www.europeana.eu 19 www.getty.edu 15 References 1. Bernhard Humm, Timm Heuss: "Schlendern durch digitale Museen und Bibliotheken - Vom Umgang mit riesigen semantischen Daten" (in German). In Börteçin Ege, Bernhard Humm, Anatol Reibold (Editors): Corporate Semantic Web". Springer-Verlag, 2015. ISBN 978-3- 642-54885-7. 2. Timm Heuss, Bernhard Humm, Tilman Deuschel, Torsten Fröhlich, Thomas Herth, Oliver Mitesser: Semantically Guided, Situation-Aware Literature Research. Workshop on User Interaction built on Library Linked data (UILLD 2013), Pre-conference to the 79th World Library and Information Conference, Singapore, 2013. In H.G. Cervone, L. G. Svensson (Eds): "Linked Data and User Interaction", pp 66-84. Walter De Gruyter GmbH, Berlin / Boston, 2015. ISBN: 978-3-11-031692-6 3. Tilman Deuschel, Timm Heuss, Bernhard Humm: "The Digital Online Museum". Proceed- ings of the 4th International Workshop on Semantic Digital Archives (SDA 2014). London, UK, September 2014. 4. Tilman Deuschel, Christian Greppmeier, Bernhard Humm, Wolfgang Stille: "Semantically Faceted Navigation with Topic Pies". Proceedings of the 10th International Conference on Semantic Systems (SEMANTiCS 2014), Leipzig, Germany. ACM Press New York, USA, 2014. ISBN: 978-1-4503-2927-9, DOI: 10.1145/2660517.2660520. 5. Tilman Deuschel, Timm Heuss, Bernhard Humm, Torsten Fröhlich: "Finding without Searching - A Serendipity-based Approach for Digital Cultural Heritage". Proceedings In- ternational Conference on Digital Intelligence (DI 2014), Nantes, France, 2014. 6. Chantal Eschenfelder, Karsten Gresch, Torsten Fröhlich, Bernhard Humm, Thorsten Greiner, Peter Eierdanz, Frank Blumenberg: "The other way round: from semantic search to collaborative curation". Nordic Digital Excellence in Museums Conference (NODEM 2013), Stockholm, Sweden, Dec. 2013. Author, F.: Article title. Journal 2(5), 99–110 (2016). 7. Ulrich Beez, Bernhard G. Humm, Paul Walsh: "Semantic AutoSuggest for Electronic Health Records". In: Hamid R. Arabnia, Leonidas Deligiannidis, Quoc-Nam Tran (Eds): Proceed- ings of the 2015 International Conference on Computational Science and Computational Intelligence. Las Vegas, Nevada, USA, 7-9 Decemeber 2015. IEEE Conference Publishing Services 2015. ISBN 978-1-4673-9795-7/15, DOI 10.1109/CSCI.2015.85