<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Exploring the Properties of the Context and Lattice of the Integral Analytical Model *</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Anna</forename><surname>Korobko</surname></persName>
							<email>lynx@icm.krasn.ru</email>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Institute of Computational Modeling</orgName>
								<orgName type="department" key="dep2">Siberian Branch</orgName>
								<orgName type="institution">Russian Academy of Sciences</orgName>
								<address>
									<addrLine>50/44 Akademgorodok</addrLine>
									<postCode>660036</postCode>
									<settlement>Krasnoyarsk</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Exploring the Properties of the Context and Lattice of the Integral Analytical Model *</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">30E1AFBD2C53C9F986BC090DD468A28E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T19:56+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Context Properties</term>
					<term>OLAP</term>
					<term>FCA</term>
					<term>Exploratory OLAP</term>
					<term>Heterogeneous Data</term>
					<term>Big Data</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Diversity, multidimensionality, and the amount of available information require an original approach to the operational analytical processing of heterogeneous data. The integral analytical model provides a federation of heterogeneous data without physical moving, with the support of interactive visual exploration of a large model, and with the execution of analytical queries on distributed data sources simultaneously. Compact representation and native management of big data is achieved by presenting the model in the form of a context and building a lattice for it, in accordance with the FCA method. The theory of integral analytical modeling (IAM) relies on the fact that the context of the model has special properties that ensure fast construction of the lattice and its compactness. The goal of the article is to conduct a comparative analysis of the properties of the IAM context and contexts of various origins, to evaluate and compare the rate of the lattice generation and their properties.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The value of open information resources provides a unique opportunity to make effective management decisions based on a broader factual base <ref type="bibr" target="#b0">[1]</ref>. In response to demand, the data analysis software market is actively developing. The capabilities of domestic (Yandex.DataLens, Polymatica, Visiology и др.) and foreign (Tableau, Qlik, MS Power BI, etc.) data analysis systems vary from methods of mathematical statistics to analytical platforms with built-in methods of data mining. Choosing a data analysis system, companies first pay attention to price and functionality. Secondly, they look for design clarity and a user-friendly interface, and rapid data access. The speed of including new data into the analysis process and reduction in user requirements are becoming important options.</p><p>Information technologies are dynamically developing and provide new forms of data presentation and methods of their processing. The wide choice of analytical systems and the constant development of new software confirm this. Diversity, multidimensionality, and the amount of available information require the development of original approaches to the operational analytical processing of heterogeneous data. The world scientific community formulates various aspects of the problem as separate tasks: Exploratory On-Line Analytical Processing (ExOLAP) <ref type="bibr" target="#b1">[2]</ref>, Self-service Business Intelligent (Self-service BI) <ref type="bibr" target="#b2">[3]</ref> and Big Data <ref type="bibr" target="#b3">[4]</ref>.</p><p>The integral analytical model (IAM) provides a federation of heterogeneous data without physically moving, with the interactive exploration of a large model, and with the execution of analytical queries on distributed multiple data sources simultaneously. The theory of building IAM is based on the technology of online analytical data processing (OLAP) and method of formal conceptual analysis (FCA). The main requirement of the OLAP technology is the presentation of data in a multidimensional view. Categorical data with a finite domain of values are related to dimensions, and numerical data are called measures. The multidimensional representation has an impressive theoretical base. It is widely used in modern data analysis systems. A lot of popular tools for visualizing the analysis results are built on its basis. The IAM construction method consists in virtual combining (integration) of heterogeneous data based on the theory of multidimensional modeling. The structure of the combined sources is described in terms of a multidimensional representation and integrated into a general integral model. The FCA method has become an elegant solution of the problem of research and management of a wide integral analytical model. As a result of the adaptation of the method to the terms of multidimensional data representation, it is possible to present the integral analytical model in the form of an algebraic lattice with OLAP cubes at the vertices.</p><p>The proposed approach has improved significantly over the last 10 years. Original methods have been developed for constructing multidimensional models for relational sources and databases of XML documents <ref type="bibr" target="#b4">[5]</ref>, a method for combining models of heterogeneous sources has been proposed <ref type="bibr" target="#b5">[6]</ref>, a method for constructing an IAM in the form of a lattice of OLAP cubes has formally been described, a method has been proposed to support the formation of a user analytical query to integral models and models for a number of subject areas have been built (prevention and elimination of emergency consequences, effectiveness of scientific activity and support for placing a municipal procurement). However, the answer to the key question has been outside the scope of the study. Does the representation of the integral model as a lattice satisfy the requirement for the efficiency of analytical processing of a combined set of heterogeneous data?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Problem Statement</head><p>The result of the analytical integration of heterogeneous sources is a binary matrix of bi-adjacency. Matrix rows correspond to measures, columns to dimensions, and the cell value at the intersection of a row and a column indicates the analytical compatibility of the elements. By interpreting the constructed matrix as a context, we can construct a lattice for it. This paper is devoted to the study of the IAM properties of the speed of building a model by the context and properties of the model as a lattice of OLAP-cubes. The study was conducted in the form of a computational experiment.</p><p>The object of research is IAM for placing a municipal procurement. The model is built using the analytical integration software module and combines two dissimilar sources: relational (Regional system of forming demands) and XML-documents base (Unified information system in the field of procurement). The model context ("IAM_context") contains 1442 rows and 263 columns. The percentage of matrix filling is 2.65% -10,046 non-zero values.</p><p>The main scientific hypothesis is that the IAM context has special properties that ensure the rapid construction of the lattice and its compactness. It means that IAM is suitable for supporting the on-line analytical processing of big heterogeneous data and rapid integration of new sources. The aim of the article is to perform a comparative analysis of the parameters of contexts of different origin, to evaluate and compare the rate of the lattice generation and size for different contexts.</p><p>Context and algebraic lattice construction are the core of many modern decision support methods <ref type="bibr" target="#b6">[7]</ref>: information retrieval, classification, formation of recommendations, generation of association rules etc. Researchers use the FCA method to study the structural features of texts, user preferences, ontological concepts, and purchases. A classic example of the capabilities of association rules is the market basket analysis. Consider the Instacart Market Basket Analysis dataset as an example of a real domain for the context construction and comparison with the IAM context. The dataset contains information on orders in the Instacart grocery delivery service. The data was downloaded from the public platform of the data analysis competition -Kaggle.com (https://www.kaggle.com/psparks/instacartmarket-basket-analysis). The dataset consists of six files; to form a binary context, we need only one -order_products__prior.csv. The file describes the correspondence of the order identifiers and product codes. For the experiment, the data is loaded as a Pandas DataFrame and converted to the binary context ("IMBA_context").</p><p>The context rows correspond to orders and the columns correspond to purchased products. The resulting context is larger than the "IAM context". We limited the original dataset and considered two contexts: ("IMBA_context_1") coinciding in the number of non-zero elements with the IAM context and ("IMBA_context_2") close in size. The size of the context "IMBA_context_1" is 980 rows and 4521 columns, the percentage of filling is 0.23%. The size of the context "IMBA_context_2" is 238 rows and 1596 columns, the percentage of filling is 0.59%. These contexts differ in that the number of columns is greater than the number of rows. In the context of "IAM_context" we can see the opposite proportions. In term of the experimental integrity, we considered additionally transposed contexts -"IMBA_T_context_1" and "IMBA_T _context_2". The control context ("RND_context") is generated using a random number generator. "RND_context" has the same size and filling density as the "IAM_context".</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Experimental Study</head><p>The computational experiment was implemented in the JupyterLab environment in Python 3.7 using the libraries: pandas, numpy, matplotlib, plotly, net-workx and time.</p><p>To calculate the lattice concepts, we used the original implementation of the "In Close" <ref type="bibr" target="#b7">[8]</ref> algorithm, optimized using the built-in data structures of the Python language. Link to the project page is https://github.com/khroom/FCA_LAB. The generation of concepts for all the contexts was performed by a single function with the measurement of the computation time (Fig. <ref type="figure">1</ref>).</p><p>Fig. <ref type="figure">1</ref>. The python code of the "In-Close" algorithm.</p><p>A comparative analysis of the following aspects was conducted: properties of contexts, speed of the generation of a lattice by the context, number of concepts (vertices) of a lattice and properties of concepts (extent and intent). The comparison results are shown in Table <ref type="table" target="#tab_0">1</ref>. The research results show that the used algorithm is sensitive to the ratio of the context sizes. The generation time for the concepts for the transposed contexts "IMBA_T_context_1" and "IMBA_T_context_2" is significantly lower than that for the original contexts. This can be used to optimize the algorithm. Figure <ref type="figure" target="#fig_0">2</ref> shows a fragment of a scatter diagram of the extent and intent of concepts. We interpret the speed of the generation of concepts and their number as a sign of the presence or absence of a special structure in the context. The IAM context is superior to other contexts in both parameters. The most significant differences are between the IAM context and a randomly filled context. The context "RND_context" is filled evenly with a given density, which is reflected in the generation speed of 1 minute 2 seconds, and in the number of concepts of 10633. A large number of the lattice concepts imply an unstructured context.</p><p>The scatterplot shows that when the extent is 1, the intent changes from 21 to 54 for the "RND_context". And when the intent is 1, the extent belongs to the interval <ref type="bibr" target="#b1">(2,</ref><ref type="bibr">15)</ref>. The extent and intent of the rest of the concepts do not exceed 8. The artificial context is not a domain model and does not reveal relationships between the real entities. While the IAM context lattice is built in 1.36 seconds, it has 205 concepts and contains the concepts that unite up to 30% of the rows and columns. Fast generation, concepts large in extent and intent, and compactness of the lattice suggest a strong structuring of the context -the integral analytical model of heterogeneous sources. Now we will consider the contexts built for another real domain -grocery delivery service. Let us consider transposed versions of the contexts, as this significantly affects the speed of the concept generation. The context "IMBA_T_context_1" and the context "IAM_context" have the same number of nonzero elements but differ in size. The "IMBA_T_context_1" is 11 times larger and 11 times sparser than the "IAM_context". The concept generation time for the "IMBA_T_context_1" is 2 minutes, and the lattice includes 6722 concepts.</p><p>The context "IMBA_T_context_1" is semi-structured. The number of the concepts is lower than that of the "RND_context", but due to the large size, the generation time is significantly longer. The significant difference between the maximum extent and the maximum intent of the concepts indicates a specific structure of the "IMBA_T_context_1". We can see a widespread among the purchased products and a weak relationship between them. Due to the nature of the subject area, researchers often study product groups to solve the problem of defining related products. The context "IMBA_T_context_2" has a size comparable to the size of "IAM_context". It is based on the "IMBA_T_context_1" but has a smaller size and higher density. The concept generation time for the "IMBA_T_context_2" is 2.74 seconds, and the lattice includes 642 concepts.</p><p>The context "IMBA_T_context_2" is also semi-structured. The number of the concepts is 3 times greater than the number of the concepts in the IAM context, with a similar dimension. The difference between the maximum extent and intent of the concepts is not as significant as in the larger concept. This means that the data is not homogeneous. Figure <ref type="figure" target="#fig_0">2</ref> shows that for this context, many concepts have an extent equal to 1 and larger sized intents -these are receipts that combine up to 34 products. There are not many popular products and there are even fewer repetitive product combinations. Faster generation of the concepts compared to a random context implies the structuredness of the context.</p><p>The results of the comparative analysis of the generation parameters and properties of the contexts confirm that the IAM context has a special structure. These special properties make the lattice fast and compact. Given the same size, number of nonzero elements and density of the context, the estimated parameters are greatly different and strongly affect the efficiency of manipulating the concept lattice. This means that the compactness and speed of constructing the lattice is determined by some internal properties of the context, structural relationships of the entities of the modeled subject area.</p><p>During the study, significant structural features of the real IAM context were identified. The functional dependences between the attributes of the original storage schemes are reflected in the hierarchical dependences between the dimensions of the multidimensional model. In the context of IAM, they take the form of "mutual existence constraints" in accordance with the modern theory of the a priori formation of the system of the measured properties <ref type="bibr" target="#b4">(5,</ref><ref type="bibr" target="#b5">6)</ref> in the FCA methodology. The existence constraints between the IAM dimensions significantly reduce the speed of calculating the concepts and reduce their number in comparison with the control model. In addition, the analysis of the properties of the concepts makes it possible to identify the boundaries of the size of the extent and intent of the concepts due to the natural limitations of the number of analytical links in real databases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion</head><p>On-line processing of big data requires high-speed performance in the conditions of a large volume and heterogeneity of information. The results of the performed computational experiment show that the representation of the integral analytical model of heterogeneous data as a lattice is suitable for solving modern problems of the real-time analysis of big data. The development of the proposed approach is associated with the systematization of the previously obtained results and a description of the full cycle of creation and use of the integral analytical model. Improving the theoretical basis for the approach consists in intellectualizing the process of forming IAM in terms of building a multidimensional model and taking into account the variability of analytical relationships.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. The Scatter plot of the concept intents and extents.</figDesc><graphic coords="5,172.08,250.56,252.00,216.48" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="4,127.68,270.24,339.84,204.96" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>The comparison results.</figDesc><table><row><cell></cell><cell>IAM_</cell><cell>IMBA_</cell><cell>IMBA_T_</cell><cell>IMBA_</cell><cell>IMBA_T_</cell><cell>RND_</cell></row><row><cell></cell><cell>context</cell><cell>context_1</cell><cell>context_1</cell><cell>context_2</cell><cell>context_2</cell><cell>context</cell></row><row><cell>Size</cell><cell cols="6">1,442х263 980х4521 4,521x980 238х1596 1,596х238 1,442х263</cell></row><row><cell>Fill density</cell><cell>2.65%</cell><cell>0.23%</cell><cell>0.23%</cell><cell>0.59%</cell><cell>0.59%</cell><cell>2.64%</cell></row><row><cell>Number of non-zero items</cell><cell>10,046</cell><cell>10,046</cell><cell>10,046</cell><cell>2,255</cell><cell>2,255</cell><cell>10,046</cell></row><row><cell>Speed of concepts generation</cell><cell>1.36s</cell><cell>9min 45s</cell><cell>2min</cell><cell>25.1s</cell><cell>2.74s</cell><cell>1min 2s</cell></row><row><cell>Number of concepts</cell><cell>205</cell><cell>6722</cell><cell>6722</cell><cell>642</cell><cell>642</cell><cell>10633</cell></row><row><cell>Maximal extent of</cell><cell>394</cell><cell>158</cell><cell>46</cell><cell>28</cell><cell>34</cell><cell>54</cell></row><row><cell>concepts</cell><cell>(27.3%)</cell><cell>(16.1%)</cell><cell>(1%)</cell><cell>(11.8%)</cell><cell>(2.1%)</cell><cell>(3.7%)</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">When There&apos;s No Such Thing as Too Much Information</title>
		<author>
			<persName><forename type="first">Steve</forename><surname>Lohr</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">New York Times. Aprill</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Using semantic web technologies for exploratory OLAP: a survey</title>
		<author>
			<persName><forename type="first">A</forename><surname>Abelló</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Romero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">B</forename><surname>Pedersen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Berlanga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Nebot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Aramburu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Simitsis</surname></persName>
		</author>
		<idno type="DOI">10.1109/TKDE.2014</idno>
		<idno>.2330822</idno>
		<ptr target="https://doi.org/10.1109/TKDE.2014" />
	</analytic>
	<monogr>
		<title level="j">IEEE transactions on knowledge and data engineering</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="571" to="588" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Self-Service Business Intelligence</title>
		<author>
			<persName><forename type="first">P</forename><surname>Alpar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Schulz</surname></persName>
		</author>
		<idno type="DOI">10.1007/s12599-0160424-6</idno>
	</analytic>
	<monogr>
		<title level="j">Bus. Inf. Syst. Eng</title>
		<imprint>
			<biblScope unit="volume">58</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="151" to="155" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Critical analysis of Big Data challenges and analytical methods</title>
		<author>
			<persName><forename type="first">U</forename><surname>Sivarajah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Kamal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Irani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Weerakkody</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.jbusres.2016.08.001</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1016/j.jbusres.2016.08.001" />
	</analytic>
	<monogr>
		<title level="j">Journal of Business Research</title>
		<imprint>
			<biblScope unit="volume">70</biblScope>
			<biblScope unit="page" from="263" to="286" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Multidimensional Design from XML Sources for the Integral Analytical Model</title>
		<author>
			<persName><forename type="first">A</forename><surname>Korobko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Korobko</surname></persName>
		</author>
		<idno type="DOI">10.12783/dtcse/aiie2017/18203</idno>
	</analytic>
	<monogr>
		<title level="j">DEStech Trans. Comput. Sci. Eng. AIIE</title>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Matching disparate dimensions for analytical integration of heterogeneous data sources</title>
		<author>
			<persName><forename type="first">A</forename><surname>Korobko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Korobko</surname></persName>
		</author>
		<idno>3297662.3365809</idno>
		<ptr target="https://doi.org/10.1145/" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 11th International Conference on Management of Digital EcoSystems (MEDES &apos;19</title>
				<meeting>the 11th International Conference on Management of Digital EcoSystems (MEDES &apos;19</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="66" to="72" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Application of formal conceptual analysis for intelligent decision support</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">I</forename><surname>Pahomova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">V</forename><surname>Korobko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence and Decision Making</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="37" to="46" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note>in Russian</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">In-close, a fast algorithm for computing formal concepts</title>
		<author>
			<persName><forename type="first">S</forename><surname>Andrews</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Conceptual Structures</title>
				<meeting><address><addrLine>Moscow</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
