<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Introduction to a Data-driven Analysis Tool of Molecular Dynamics Self-Assembled Lipid Bilayer Trajectories</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Stelios</forename><surname>Karozis</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Nuclear &amp; Radiological Sciences and Technology, Energy &amp; Safety</orgName>
								<orgName type="institution">NCSR &quot;Demokritos&quot;</orgName>
								<address>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Michael</forename><surname>Kainourgiakis</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Institute of Nuclear &amp; Radiological Sciences and Technology, Energy &amp; Safety</orgName>
								<orgName type="institution">NCSR &quot;Demokritos&quot;</orgName>
								<address>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Introduction to a Data-driven Analysis Tool of Molecular Dynamics Self-Assembled Lipid Bilayer Trajectories</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">7F57A4FBA5A7EABB8FB372A7E2F6EBB4</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T00:01+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>lipid bilayer</term>
					<term>Molecular Dynamics</term>
					<term>Machine learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The in-silico studies reported so far for the representation of the structure and the evaluation of the transport properties of lipid bilayers are in general based on assumptions and approaches that simplify the real system and problem. Nevertheless, the structure and organization of the lipid bilayers strongly affect transport coefficients. This is a quite important observation, showing that simulations can be meaningful only when addressing realistic structures, mimicking the actual lipid phase system as elaborately as possible.</p><p>In the current study, a computational tool is presented that uses Molecular Dynamics simulations (MD) results of spontaneous selfassembly lipid bilayer structures with different oriented and shaped lipid bilayer, in order to analyze the resulted trajectories, creating a Machine Learning (ML) ready dataset that can be used in a series of ML algorithms, depending the case. The development of the tool is in the alpha stage, where tests are performed, with a planned public release in free and open source license.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>As molecular simulations (MS) continue to evolve into powerful computational tool for studying complex biomolecular systems and the exponential growth of computational power, the systems under study are becoming more complex. As such, a large amount of configurations are produced with more ease that permit to diminish the uncertainty of the calculated thermodynamics properties. The main tools derive from statistical mechanics, hence the larger the sample becomes, the more accurate the calculation.</p><p>On the other hand, the large amount of MS results creates a data processing problem in terms of software and hardware capabilities. The hardware problem can be surpassed with modern solutions, such as distributed data processing systems, or by new software implementations that are more efficient in limited hardware infrastructures. Most MS simulation packages incorporate their own post processing tools or suggest the use of open source compatible softwares that are sufficient enough for most cases. MDTraj <ref type="bibr" target="#b0">[1]</ref> is used in a range of cases or as basis for other processing software like TTClust <ref type="bibr" target="#b1">[2]</ref>, that partition thousands of frames into a limited number of most dissimilar conformations. Other tools, like TRAVIS ("Trajectory Analyzer and Visualizer") <ref type="bibr" target="#b2">[3]</ref> and pyPcazip <ref type="bibr" target="#b3">[4]</ref> are autonomous and were developed for a specific case, thus lacking generic applicability.</p><p>The aforementioned packages, alongside the incorporated tools of MD and MC simulation softwares, are well established and tested but they don't solve the problem of processing the big data production of MS simulations. Machine Learning (ML) algorithms are data analytics tools where no equation or pre defined model exists. The goal is to deduce ("learn") the model from the data. ML may be useful not only for managing and analyzing the big data of MS simulations but also as a new way to study systems and discover patterns that may lead to insights about the case under investigation. ML has been already used in MS in many different ways from post processing, to preparation of input parameters and the error reduction of simulation itself <ref type="bibr" target="#b4">[5]</ref><ref type="bibr" target="#b5">[6]</ref><ref type="bibr" target="#b6">[7]</ref><ref type="bibr" target="#b7">[8]</ref><ref type="bibr" target="#b8">[9]</ref>.</p><p>In the current paper, we introduce a computational tool of analyzing random oriented lipid bilayers derived from MD trajectories and creating a dataset ready to be used in ML algorithms. The initial data consist of spontaneous self-assembly structures of the lipid bilayer using MD simulations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">CAPABILITIES AND IMPLEMENTATION</head><p>The workflow under discussion consists of three distinct steps; (1) the analyzing of the MD trajectories, (2) the creation of the ML ready dataset and (3) the use of the dataset to ML algorithms (see Figure <ref type="figure" target="#fig_0">1</ref>).</p><p>The tool is written in Python3 programming language and provides a dynamic input interface, that is capable of filling the requirements of each user case. The user have to describe the atom groups and the primary analysis for each group. Moreover, the input interface enables the combination of the results of primary analysis in order to calculate secondary properties for the system. The aforementioned inputs need to be written in python dictionary format.</p><p>In order to address the problem of different oriented and shaped lipid bilayer, which is the result of self assemblage (see Section 3), the tool performs a domain decomposition of the final configuration and identifies the atoms that belong to the user defined groups. Each group and domain becomes a sub-system that will be analyzed as a unique MD system. As such, each MD simulation may create more than one sub-systems, hence, instances in the final dataset. By breaking the system to small domains, where the assumption of no curvature, no intersection point etc can be applied, the conformation is treated as an ideal bilayer structure, and a series of MD analysis tools can be used. The resulted dataset can be used as input to ML algorithms, which enable to patterns' identification and gain insights for large and complex bilayer structures.</p><p>The tool can load efficiently trajectory and/or topology data from the format used in GROMACS <ref type="bibr" target="#b9">[10]</ref> MD simulation tool and use many post-process tools that GROMACS provides, alongside customized calculation (primary or secondary) in order to calculate a series of properties for each sub-system. The structural characteristics that are calculated for each sub-system, are the peaks of density profile, the tilt of the order part of the order part of lipid chain, the peaks of radial distribution function of pairs of lipid groups and the order parameter of lipid chains.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">CASE STUDY</head><p>The orientation of the lipid bilayer can be studied through MD calculations. However, such treatments are based on the a-priori placement of the lipid molecules in appropriate positions, in order to form a periodical system with appropriately oriented hydrophilic chains and hydrophobic groups. Despite the fact that the aforementioned formation saves a substantial amount of simulation time, it only represents a simplified and ideal approximation of the formation in equilibrium and does not ensure that its structural and dynamical properties are simulating accordingly the real/natural lipid phase of the system under study.</p><p>Other approaches <ref type="bibr" target="#b10">[11]</ref> recreate the structure of the lipid bilayer using MD with random initial configurations of the molecules. This treatment aims to study the dynamics of the system while it moves towards equilibrium and to the spontaneous self-assembly of the single lipids into a bilayer, as well as simulate more realistic conformations of minimum energy. Thus, any approximation based on the a-priori placement of the lipids will be eliminated. Due to the randomness of the initial configuration which affect the resulted structure, a sufficient sampling of self-assembled systems need to be produced (10 2 order of magnitude). All of the resulted systems are far from ideal, in terms of shape and orientation, and the properties are correlated by the local composition, shape, orientation, bilayer thickness etc. The provided tools of analyzing MD trajectories lack the functionality to processing random oriented and shaped bilayer structures. The tool presented in the current paper attempts to address that problem by decompose the each simulation resulted conformation in small sub-domains, calculating structural properties of each sub-domain, such as density profile, order parameter, radial distribution function, tilt of lipid chains, and producing a ML ready dataset in order to apply data driven techniques, such as classification or clustering. The ML techniques will provide a fast, efficient and unbias way to group the different sub-domains and it will try to identify and extract the physical meaning of each resulted group via their properties. That information will lead to a recommendation of a series of distinct and well defined bilayer structure that exist simultaneously in the macroscopic the system. The recommended conformation can be reconstructed and can be taken into account in future studies of the system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">DISCUSSION</head><p>The capabilities of the tool serve as a bridge, connecting MD data with structural properties and ML algorithms for general data science audiences. The derived measurements constitute a domain dataset, aiming to feed ML algorithms and (i) explore patterns that may emerge by applying unsupervised learning algorithms or (ii) build a model that predicts a property of interest. Moreover, the calculated properties can be used as supplement data to a larger dataset. The tool stands out for the novel approach of examining the system as a series of sub-system, thus surpassing the problems and limitations of analyzing complex lipid bilayer structures.</p><p>The tool's development state is an alpha version, where tests and debugging are performed. As future work, the outcome and results of the a case study is planned to be used, alongside the first public release of the code under free and open source license. The tool is hosted at: https://mssg.ipta.demokritos.gr/gitlab/skarozis/toobba</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Illustration of the workflow process of the presented tool.</figDesc><graphic coords="2,53.80,83.68,240.25,209.07" type="bitmap" /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ACKNOWLEDGMENTS</head><p>This research is co-financed by Greece and the European Union (European Social Fund -ESF) through the Operational Programme «Human Resources Development, Education and Lifelong Learning» in the context of the project "Reinforcement of Postdoctoral Researchers -2nd Cycle" (MIS-5033021), implemented by the State Scholarships Foundation (IKY).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">T</forename><surname>Mcgibbon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">A</forename><surname>Beauchamp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">P</forename><surname>Harrigan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Swails</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">X</forename><surname>Hernández</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">R</forename><surname>Schwantes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L.-P</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">J</forename><surname>Lane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">S</forename><surname>Pande</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Biophysical Journal</title>
		<imprint>
			<biblScope unit="volume">109</biblScope>
			<biblScope unit="page" from="1528" to="1532" />
			<date type="published" when="2015-10">oct 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">TTClust: A Versatile Molecular Simulation Trajectory Clustering Program with Graphical Summaries</title>
		<author>
			<persName><forename type="first">T</forename><surname>Tubiana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-C</forename><surname>Carvaillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Boulard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bressanelli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Chemical Information and Modeling</title>
		<imprint>
			<biblScope unit="volume">58</biblScope>
			<biblScope unit="page" from="2178" to="2182" />
			<date type="published" when="2018-11">nov 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">TRAVIS-A free analyzer for trajectories from molecular simulation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Brehm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Thomas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gehrke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kirchner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Journal of Chemical Physics</title>
		<imprint>
			<biblScope unit="volume">152</biblScope>
			<biblScope unit="page">164105</biblScope>
			<date type="published" when="2020-04">apr 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">pyPcazip: A PCA-based toolkit for compression and analysis of molecular simulation data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Shkurti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Goni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Andrio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Breitmoser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Bethune</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Orozco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">A</forename><surname>Laughton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SoftwareX</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="44" to="50" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Representing molecular and materials data for unsupervised machine learning</title>
		<author>
			<persName><forename type="first">E</forename><surname>Swann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Cleland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Barnard</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Molecular Simulation</title>
		<imprint>
			<biblScope unit="volume">44</biblScope>
			<biblScope unit="page" from="905" to="920" />
			<date type="published" when="2018-07">jul 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Empirical Classification of Trajectory Data: An Opportunity for the Use of Machine Learning in Molecular Dynamics</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">K</forename><surname>Carpenter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Ezra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">C</forename><surname>Farantos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">C</forename><surname>Kramer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wiggins</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Journal of Physical Chemistry B</title>
		<imprint>
			<biblScope unit="volume">122</biblScope>
			<biblScope unit="page" from="3230" to="3241" />
			<date type="published" when="2018-04">apr 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Advances of machine learning in molecular modeling and simulation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Haghighatlari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hachmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Current Opinion in Chemical Engineering</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page" from="51" to="57" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Machine learning for collective variable discovery and enhanced sampling in biomolecular simulation</title>
		<author>
			<persName><forename type="first">H</forename><surname>Sidky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">L</forename><surname>Ferguson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Molecular Physics</title>
		<imprint>
			<biblScope unit="volume">118</biblScope>
			<biblScope unit="page">e1737742</biblScope>
			<date type="published" when="2020-03">mar 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Machine Learning for Molecular Simulation</title>
		<author>
			<persName><forename type="first">F</forename><surname>Noé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tkatchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-R</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Clementi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Annual Review of Physical Chemistry</title>
		<imprint>
			<biblScope unit="volume">71</biblScope>
			<biblScope unit="page" from="361" to="390" />
			<date type="published" when="2020-04">apr 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Abraham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Murtola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Schulz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Páll</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hess</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Lindahl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SoftwareX</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="19" to="25" />
			<date type="published" when="2015-09">sep 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Molecular simulations of self-assembled ceramide bilayers: comparison of structural and barrier properties</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">N</forename><surname>Karozis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">I</forename><surname>Mavroudakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">C</forename><surname>Charalambopoulou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Kainourgiakis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Molecular Simulation</title>
		<imprint>
			<biblScope unit="volume">46</biblScope>
			<biblScope unit="page" from="323" to="331" />
			<date type="published" when="2020-03">mar 2020</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
