1. Introduction

Injecting Conceptual Constraints into Data Fabrics

Paolo Ciaccia

Davide Martinenghi

Riccardo Torlone

2 0 Dipartimento di Elettronica, Informazione e Bioingegneria , Politecnico di Milano , Italy 1 Dipartimento di Informatica - Scienza e Ingegneria, Università di Bologna , Italy 2 Dipartimento di Ingegneria, Università Roma Tre , Italy

Unlike traditional sources managed by DBMSs, data lakes do not provide any guarantee about the quality of the data they store, which can severely limit their use for analysis purposes. The recent notion of data fabric, which introduces a semantic layer allowing uniform access to underlying data sources, makes it possible to tackle this problem by specifying conceptual constraints to which data sources must adhere to be considered meaningful. Along these lines, in this discussion paper, we exploit the data fabric approach by proposing a general methodology for data curation in data fabrics based on: (i) the specification of integrity constraints over a conceptual representation of the data lake and (ii) the automatic translation and enforcement of such constraints over the actual data. We discuss the advantages of this idea and the challenges behind its implementation.

1. Introduction

In traditional big data analysis, activities such as cleaning, transforming, and integrating source data are essential but they usually make knowledge extraction a very long and tedious process. For this reason, data-driven organizations have recently adopted an agile strategy that dismisses any data processing before their actual consumption. This is done by building and maintaining a repository, called “data lake”, for storing any kind of data in its native format. A dataset in the lake is usually just a collection of raw data, either gathered from internal applications (e.g., logs or user-generated data) or from external sources (e.g., open data), that is made persistent on a storage system, usually distributed, “as is”, without going through an ETL process.

Unfortunately, reducing the engineering efort upfront just delays the traditional issues of data pre-processing since this approach does not eliminate the need for high-quality data and schema understanding. Therefore, to guarantee reliable results, a long process of data preparation (a.k.a. data wrangling) is required over the portion of the data lake that is relevant for a business purpose before any meaningful analysis can be performed on it [ 1, 2, 3, 4 ]. This process typically consists of pipelines of operations such as: source and feature selection, data enrichment, data transformation, data curation, and data integration. A number of state-ofthe-art applications can support these activities, including (i) data and metadata catalogs, for understanding and selecting the appropriate datasets [ 5, 6, 7, 8 ]; (ii) tools for full-text indexing, for providing keyword search and other advanced search capabilities [ 7, 9, 10 ]; (iii) data profilers, for collecting meta-information from datasets [ 1, 9, 11 ]; (iv) distributed data processing engines like Spark [ 12 ], and (v) tools and libraries for data manipulation and analysis, such as Pandas1 and Scikit-learn,2 in conjunction with data science notebooks, such as Jupyter3 and Zeppelin.4 Still, data preparation is an involved, fragmented, and time-consuming process, thus making the extraction of valuable knowledge from the lake hard.

In this scenario, the recent data fabric approach comes to the rescue, by proposing the construction and maintenance of a semantic representation of the underlying data for data discovery, understanding, and searching [ 13, 14, 15 ]. We argue that this can also be profitably exploited for evaluating and improving the quality of data. This is because a representation of the real-world concepts and relationships that the data capture (e.g., employees, customers, products, locations, sales, and so on) provides an ideal setting for identifying the constraints that hold in the application domain of reference (e.g., the fact that, for business purposes, all the products for sale must be classified in categories). If we are able to map and enforce such constraints on the underlying data, their quality naturally improves and makes the subsequent analysis more efective and less prone to errors.

Building on this idea, in this paper5 we propose a principled approach to data curation in data lakes based on the identification and enforcement of conceptual constraints. The approach is based on the following main activities: (1) the gathering of metadata from the data lake (or from a portion of interest for a specific business goal) in the form of a conceptual schema, (2) the analysis of the conceptual schema and the specification of integrity constraints over it, (3) the automatic translation of the constraints defined at the conceptual level into constraints over the datasets in the data lake, (4) the enforcement of the integrity constraints so obtained over the actual data. While there is a large body of works on extracting and collecting metadata from data sources [ 1, 9, 11 ] and on repairing data given a set of integrity constraints [ 17, 18, 19 ], corresponding to steps (1) and (4) above, to our knowledge the issue of exploiting conceptual representations for data lake curation has never been explored before.

The rest of the paper is devoted to the presentation of some initial steps towards this goal.

Specifically, in Section 2 we state the problem by recalling the typical data life-cycle in a data lake and by illustrating, in this framework, our proposal for data curation. Then, in Section 3 we state the basic notions (datasets, schemas, constraints, and mappings) underlying our approach. This is done by means of very general definitions, in order to make the approach independent of any specific data model and format. In Section 4 we provide some details of our solution through an example. Finally, in Section 5 we discuss the related works, the main issues involved in the implementation of our proposal, and the work that needs to be done to tackle these issues.

2. Using Conceptual Constraints in Data Curation

Metadata plays a fundamental role in typical activities part of the life-cycle of data analysis, such as data ingestion, data integration, data preparation, and knowledge extraction. Its management

1https://pandas.pydata.org/

2https://scikit-learn.org/ 3https://jupyter.org/ 4https://zeppelin.apache.org/ 5A preliminary version has appeared in [ 16 ].

ProjNo {key} * * involves building and maintaining a repository of information describing all the various kinds of data that are produced in the above stages of data processing [ 9 ].

In order to harmonize the data stored in the data lake with the other components of the data fabric, a conceptual representation, i.e., a conceptual schema, of the metadata describing the content of interest of the data lake is needed. This includes concepts (such as entities, relationships, and generalizations) that map to the actual components (such as attributes, documents, and labels) of datasets stored in the data lake.

The availability of a conceptual schema of data lake can provide a number of important benefits: (i) it allows the analysts to have a general and system-independent vision of the data available in , (ii) it provides an abstract view of the data lake content which can be used to define and possibly specify queries over , and (iii) it allows the specification of real-world constraints that, enforced on , improve the overall quality of its content.

In this paper, we focus on Problem (iii) above that, to the best of our knowledge, has not been studied before, apart from our preliminary study [ 16 ]. As shown graphically in Figure 1, the methodology we propose requires the tasks that follow.

1. A (portion of interest of a) data lake is initially transformed into a “standardized” format, obtained by adapting source data to that of the system chosen for storing a “curated” version of . 2. The skeleton ̂︀ of a conceptual schema is built from . Basically, ̂︀ includes the main entities and relationship involved in as well as a mapping between the components of and the elements of ̂︀. This task can be done manually and/or using available techniques and tools for semantic annotation or column-type discovery in data lakes [ 20, 21, 22 ]. 3. ̂︀ is refined, possibly incrementally, into an “evolved” schema by adding a collection of real-world constraints. For instance, by stating that an entity is a special case of another entity or that an entity can only participate in a single occurrence of a certain relationship. Typically, this step requires knowledge of the specific domain (e.g., that a department has a single manager).

4. The constraints represented by are mapped to constraints over the actual data stored in . can be expressed in several ways, depending on the system used to store and manage . 5. The constraints are checked on and, possibly, enforced on by means of a repairing technique [ 23 ], if any violation occurs. Again, this can be done in several ways, depending on the tools available for storing, querying, and manipulating data in the data lake [ 19, 24 ].

We can notice that in the process above no specific work has specifically addressed point 4. In the rest of the paper, we focus on this challenging task by first introducing the relevant elements of the problem (Section 3), and by then illustrating the main ideas for its solution through an example (Section 4).

3. Data and Metadata Management

Let us now fix some basic notions that we will refer to in the following. Our definitions are deliberately abstract so as to be as general as possible, without the need to commit to any specific data lake model and format.

Dataset. We consider that a dataset (, ) has a name and is composed of a set of attributes and a set of data items. Each data item in is a set of attribute-value pairs, with attributes taken from . Figure 2 shows an example of datasets still in a “raw” format, reporting data about the finance and tech departments of a company. After curation, the so-obtained datasets also take part in the data lake.

Data Lake. For our purposes, a data lake = (, ℳ) can be modeled as a collection

Name Homer Marge Bart

Lisa D_Dept DeptCode D01 D02

Salary 100K 150K 80K 50K

DeptCode D01 D02 D02 D01 of datasets having distinct names, plus a set of metadata ℳ, including a (possibly empty) set of constraints on the datasets. We also refer to as the instance of . Figure 3 shows a collection of partially curated datasets in (D_Emp, D_Dept, and D_Act) that have been obtained from the raw datasets of Figure 2 by unnesting employees from departments and activities from employees. The metadata include, e.g., cross-dataset constraints, such as the fact that DeptCodes appearing in D_Emp must also appear in D_Dept, as well as, say, domain constraints such as the fact that Level must be an integer (so employee E_05 violates this). Conceptual schema and constraints. We consider that the domain of interest for analysis purposes is represented by a conceptual schema , expressed by means of a suitable language ℒ . Examples are Entity-Relationship (E-R) diagrams, RDF(S), UML’s class diagrams, and Description Logic languages, such as those underlying the OWL 2 standard and its profiles. 6. Besides specific diferences, each of these languages allows for the definition of concepts (i.e., classes of objects, entities), relationships (a.k.a. as roles) among them, and properties (of concepts and relationships). A conceptual schema includes conceptual constraints that characterize the elements of the schema . For instance, in the E-R formalism we can state that two entities 1 and 2 have a common generalizing entity (subset(1, ) and subset(2, )) and that 1 and 2 are disjoint (disjoint(1, 2)).

Mapping. The connection between the conceptual schema and the data lake (, ℳ) is based on a mapping , i.e., a set of assertions relating the elements in to the datasets in . For instance, an entity Departments in could be mapped to the projection of dataset D_Dept on just the attributes DeptCode and DeptName, with the MgrNo attribute representing a relationship between Departments and Employees.

Problem statement. Our goal is to check whether an instance satisfies the conceptual constraints represented by schema . To this end, we formally define constraint satisfaction as follows.

Definition 1.

An instance is legal with respect to a conceptual schema through a mapping

6https://www.w3.org/TR/owl2-profiles/

Departments DDDeeeppptttCCCooodddeee DeptName 1:1

Direct

1:N

Work

if () yields a conceptual instance that satisfies all the constraints in , and shortly indicate this circumstance as |= ().

Clearly, explicitly applying the mapping to generate a conceptual instance is impractical for performance reasons. Therefore, in this paper we propose a diferent solution, which essentially consists in transforming the constraints in into corresponding constraints on the data lake. This leads to the following problem: given a data lake = (, ℳ), a conceptual schema , a mapping between and , determine a set of constraints on such that satisfies ℳ ∪ if and only if is legal with respect to through mapping , i.e.: ℳ ∪ |= ⇐⇒ |= ().

Once the conceptual constraints on the data lake have been generated, they may be used to check if is consistent and, eventually, to repair .

Before proceeding, we remark that, unlike OBDA (Ontology-Based Data Access) approaches [ 25 ], we do not use for the purpose of obtaining results from given a query on . Rather, is the key ingredient to define and enforce on the data lake the conceptual constraints in .

4. An Example

The E-R schema in Figure 4 describes a simplified scenario regarding the departments of a company. The schema includes structural information (such as the fact that Employees have a Name and a Salary) as well as constraints (such as the fact that Managers are also Employees or that each Department has at least one Employee). Notice that the schema deliberately does not include the NoHours attribute that characterizes each activity of a researcher (see dataset D_Act in Figure 3). This is to emphasize that only focuses on that part of the data lake that is of interest for the analysis, which here does not include, as we assume, the NoHours attribute.

Besides basic constraints on attributes, such as non-nullability and domain of admitted values (which, in the following, we will omit for brevity), relevant constraints in , here informally described as self-explanatory predicates, are: unique(DeptCode,Departments) . . . subset(Managers,Employees) subset(Researchers,Employees) disjoint(Managers,Researchers) card(Departments,Direct,1,1) card(Employees,Work,1,1) card(Departments,Work,1,n) every employee is identified by EmpNo every department is identified by DeptCode managers are employees researchers are employees no manager is a researcher every department has exactly one manager every employee works in exactly one department every department has at least one employee Now, consider the datasets in Figure 3, whose structure is reported below for the sake of clarity: D_Emp(EmpNo,Name,Salary,DeptCode,Level,CV,PID,PName,Budget), D_Dept(DeptCode,DeptName,MgrNo),

D_Act(ResNo,Activity,NoHours).

Then, we can define the mapping by means of the following statements, one for each entity and relationship in :7

The constraints corresponding to this mapping, include, among others, the following ones: • Uniqueness of DeptCode:

1 : ∀1, 2 ∈ D_Dept : 1.DeptCode = 2.DeptCode ⇒ 1 = 2 • Disjointness of managers and researchers:

2 : ∀1 ∈ D_Emp : ¬(NotNull(1.Level) ∧ NotNull(1.CV)) • Departments are directed by managers:

3 : ∀1 ∈ D_Dept∃2 ∈ D_Emp : 1.MgrNo = 2.EmpNo ∧ NotNull(2.Level) • Each department has at least one employee:

4 : ∀1 ∈ D_Dept∃2 ∈ D_Emp : 1.DeptCode = 2.DeptCode • Each employee has activities only within a project:

5 : ∀1 ∈ D_Act∃2 ∈ D_Emp : 1.ResNo = 2.EmpNo ∧ NotNull(2.PID) 7The underscore symbol indicates (anonymous) variables not relevant to the statement. The adopted notation is therefore positional like in, e.g., Datalog.

Once the constraints in have been generated, they can be easily converted into proper queries so as to detect possible violations. For instance, constraint 3 corresponds to the following query expressed in SQL: SELECT D.DeptCode, E.EmpNo FROM D_Dept D, D_Emp E WHERE D.MgrNo = E.EmpNo AND E.Level IS NULL

Consider now the datasets in Figure 3. It is apparent that violates the following conceptual constraints in : • Employee E07 has both attributes Level and CV not null, thus violating constraint 2; • Department D02 is managed by an employee (E10) that is not a manager, contradicting constraint 3, as the above SQL query would reveal; • Constraint 5 is also violated, since employee E12 appears in the dataset D_Act although she does not participate in any project.

Once the above violations are discovered, the datasets can be cleaned using some of the available methods (see, e.g., [ 19 ] and [ 24 ]).

5. Discussion and Conclusions

In this paper we have put forward the idea of generating constraints on the datasets of a data lake by exploiting a high-level, conceptual representation, in order to improve the quality of data and, consequently, that of subsequent analysis.

Our approach can be regarded as complementary to those that aim to curate data by directly specifying constraints through ad-hoc languages/tools. For instance, CLAMS [ 24 ] adopts the RDF data model for representing data in the curated layer, and defines conditional denial constraints over views of the data lake defined using SPARQL queries. Although this is a powerful approach, able to exploit the expressivity of SPARQL, it leaves the full burden of specifying constraints (and queries) to the designer/analyst. Furthermore, there is no guarantee that the set of constraints is consistent, i.e., non-contradictory. The Deequ system [ 26, 27 ] is an open-source library aimed at supporting the automatic verification of data quality. However, the constraints available in the library apply to a single dataset, thus inter-dataset constraints cannot be specified.

A major challenge of our approach is to demonstrate that the propagation of conceptual constraints, i.e., the generation of , can be fully automated. Although in the past decades a large body of work has investigated how to automatically translate ER schemas to relational tables (see, e.g., [28]), much less is known for other conceptual models and/or data models such as RDF. Our view of the problem currently considers (automatic) constraint propagation as a two-step process: (1) first, one operates a canonical transformation of the conceptual schema into a schema in the target data model of the curated layer; (2) then, is mapped to the actual . Besides the obvious advantage of splitting the complexity of the problem into two well-defined sub-problems, this approach can exploit in step (2) all that is known about the equivalence of schemas ( and in our case) expressed in the same formalism. ACM, 2019, pp. 1993–1996. URL: https://doi.org/10.1145/3299869.3320210. doi:10.1145/ 3299869.3320210. [28] V. M. Markowitz, A. Shoshani, Representing extended entity-relationship structures in relational databases: A modular approach, ACM Trans. Database Syst. 17 (1992) 423–464. URL: https://doi.org/10.1145/132271.132273. doi:10.1145/132271.132273.

[1]

Deng ,

R. C.

Fernandez ,

Abedjan ,

Wang ,

Stonebraker ,

A. K.

Elmagarmid ,

I. F.

Ilyas ,

Madden ,

Ouzzani ,

Tang , The data civilizer system , in: CIDR , 2017 .

[2]

Heudecker , A. White, The data lake fallacy: All water and little substance , Gartner Report G 264950 ( 2014 ).

[3]

Terrizzano ,

P. M.

Schwarz ,

Roth ,

J. E.

Colino , Data wrangling: The challenging journey from the wild to the lake , in: CIDR , 2015 .

[4]

Ciaccia ,

Martinenghi ,

Torlone , Foundations of context-aware preference propagation , J. ACM 67 ( 2020 ) 4: 1 - 4 : 43 . URL: https://doi.org/10.1145/3375713. doi: 10 .1145/ 3375713.

[5] CKAN: The open source data portal software , http://ckan.org/, (accessed November, 2017 ).

[6]

A. P.

Bhardwaj ,

Deshpande ,

A. J.

Elmore ,

D. R.

Karger ,

Madden ,

A. G.

Parameswaran ,

Subramanyam , E. Wu,

Zhang , Collaborative data analytics with DataHub , PVLDB 8 ( 2015 ) 1916 - 1927 .

[7]

A. Y.

Halevy ,

Korn ,

N. F.

Noy ,

Olston ,

Polyzotis ,

Roy ,

S. E.

Whang , Goods: Organizing google's datasets , in: SIGMOD , 2016 .

[8]

J. M.

Hellerstein ,

Sreekanti ,

J. E.

Gonzalez ,

Dalton ,

Dey ,

Nag ,

Ramachandran ,

Arora ,

Bhattacharyya ,

Das ,

Donsky , G. Fierro,

She ,

Steinbach ,

Subramanian , E. Sun, Ground: A data context service , in: CIDR , 2017 .

[9]

Hai ,

Geisler ,

Quix , Constance: An intelligent data lake system , in: F. Özcan, G. Koutrika, S. Madden (Eds.), Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016 , San Francisco, CA, USA, June 26 - July 01, 2016 , ACM, 2016 , pp. 2097 - 2100 . URL: https://doi.org/10.1145/2882903.2899389. doi: 10 .1145/2882903.2899389.

[10]

Ciaccia ,

Martinenghi ,

Torlone , Preference queries over taxonomic domains , Proc. VLDB Endow . 14 ( 2021 ) 1859 - 1871 . URL: http://www.vldb.org/pvldb/vol14/ p1859-martinenghi.pdf. doi:10.14778/3467861 .3467874.

[11]

Papenbrock ,

Bergmann ,

Finke ,

Zwiener ,

Naumann , Data profiling with metanome , PVLDB 8 ( 2015 ).

[12]

Zaharia ,

R. S.

Xin ,

Wendell , T. Das , M.

Armbrust , A.

Dave , X.

Meng , J.

Rosen , S.

Venkataraman , M. J.

Franklin , A.

Ghodsi , J.

Gonzalez , S.

Shenker , I. Stoica , Apache spark: a unified engine for big data processing , Commun. ACM 59 ( 2016 ).

[13]

Nargesian , E. Zhu,

R. J.

Miller ,

K. Q.

Pu ,

P. C.

Arocena , Data lake management: Challenges and opportunities 12 ( 2019 ) 1986 - 1989 . URL: https://doi.org/10.14778/3352063.3352116. doi: 10 .14778/3352063.3352116.

[14]

A. Y.

Halevy ,

Korn ,

N. F.

Noy ,

Olston ,

Polyzotis ,

Roy ,

S. E.

Whang , Managing google's data lake: an overview of the goods system , IEEE Data Eng. Bull . 39 ( 2016 ) 5 - 14 .

[15] Data fabric architecture is key to modernizing data management and integration, 2021 . URL: https://www.gartner. com/smarterwithgartner/ data-fabric-architecture-is-key-to-modernizing-data-management-and-integration.

[16]

Ciaccia ,

Martinenghi ,

Torlone , Conceptual constraints for data quality in data lakes , in: Proceedings of the 1st Italian Conference on Big Data and Data Science (ITADATA 2022 ), 2022 , pp. 111 - 122 . URL: https://ceur-ws. org/ Vol- 3340 /paper34.pdf.

[17]

Yakout ,

A. K.

Elmagarmid ,

Neville ,

Ouzzani ,

I. F.

Ilyas , Guided data repair , Proc. VLDB Endow . 4 ( 2011 ) 279 - 289 . URL: https://doi.org/10.14778/1952376.1952378. doi: 10 .14778/1952376.1952378.

[18]

Chiang ,

R. J.

Miller , A unified model for data and constraint repair , in: S. Abiteboul,

Böhm ,

Koch , K. Tan (Eds.), Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16 , 2011 , Hannover, Germany, IEEE Computer Society, 2011 , pp. 446 - 457 . URL: https://doi.org/10.1109/ICDE. 2011 . 5767833 . doi: 10 .1109/ICDE. 2011 . 5767833 .

[19]

Geerts , G. Mecca,

Papotti ,

Santoro , Cleaning data with llunatic , VLDB J . 29 ( 2020 ) 867 - 892 . URL: https://doi.org/10.1007/s00778-019-00586-5. doi: 10 .1007/ s00778-019-00586-5.

[20]

Hulsebos ,

Hu ,

Bakker ,

Zgraggen ,

Satyanarayan , T. Kraska, c. Demiralp,

Hidalgo , Sherlock: A deep learning approach to semantic data type detection , in: Proceedings of the 25th ACM SIGKDD, KDD '19 , Association for Computing Machinery, New York, NY, USA, 2019 , p. 1500 - 1508 . URL: https://doi.org/10.1145/3292500.3330993. doi: 10 .1145/3292500.3330993.

[21]

Ota ,

Müller ,

Freire ,

Srivastava , Data-driven domain discovery for structured datasets , Proc. VLDB Endow . 13 ( 2020 ) 953 - 967 . URL: https://doi.org/10.14778/3384345. 3384346. doi: 10 .14778/3384345.3384346.

[22]

Zhang ,

Hulsebos , Y. Suhara, c. Demiralp,

Li ,

W.-C.

Tan , Sato: Contextual semantic type detection in tables , Proc. VLDB Endow . 13 ( 2020 ) 1835 - 1848 . URL: https://doi.org/10. 14778/3407790.3407793. doi: 10 .14778/3407790.3407793.

[23]

Chu ,

I. F.

Ilyas ,

Krishnan ,

Wang , Data cleaning: Overview and emerging challenges , in: F. Özcan, G. Koutrika, S. Madden (Eds.), Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016 , San Francisco, CA, USA, June 26 - July 01, 2016 , ACM, 2016 , pp. 2201 - 2206 . URL: https://doi.org/10.1145/2882903.2912574. doi: 10 .1145/2882903.2912574.

[24]

M. H.

Farid ,

Roatis ,

I. F.

Ilyas ,

Hofmann , X. Chu, CLAMS: bringing quality to data lakes , in: F. Özcan, G. Koutrika, S. Madden (Eds.), Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016 , San Francisco, CA, USA, June 26 - July 01, 2016 , ACM, 2016 , pp. 2089 - 2092 . URL: https://doi.org/10.1145/2882903. 2899391. doi: 10 .1145/2882903.2899391.

[25]

Xiao ,

Calvanese ,

Kontchakov ,

Lembo ,

Poggi ,

Rosati ,

Zakharyaschev , Ontology-based data access: A survey , in: J. Lang (Ed.), Proceedings of the TwentySeventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19 , 2018 , Stockholm, Sweden, ijcai.org, 2018 , pp. 5511 - 5519 . URL: https://doi.org/10.24963/ ijcai. 2018 /777. doi: 10 .24963/ijcai. 2018 /777.

[26]

Schelter ,

Lange ,

Schmidt ,

Celikel ,

Bießmann ,

Grafberger , Automating large-scale data quality verification , Proc. VLDB Endow . 11 ( 2018 ) 1781 - 1794 . URL: http: //www.vldb.org/pvldb/vol11/p1781-schelter.pdf. doi:10.14778/3229863 .3229867.

[27]

Schelter ,

Bießmann ,

Lange ,

Rukat ,

Schmidt ,

Seufert ,

Brunelle ,

Taptunov , Unit testing data with deequ , in: P. A. Boncz , S.

Manegold , A.

Ailamaki , A.

Deshpande , T. Kraska (Eds.), Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019 , Amsterdam, The Netherlands, June 30 - July 5, 2019 ,