<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">The Application of Sequential Pattern Mining Techniques on MIMIC-IV</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Cecilia</forename><surname>Mariciuc</surname></persName>
							<email>cecilia.mariciuc27@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Alexandru Ioan Cuza University of Iaşi</orgName>
								<address>
									<addrLine>Bulevardul Carol I, Nr.11</addrLine>
									<postCode>700506</postCode>
									<settlement>Iaşi</settlement>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mădălina</forename><surname>Răschip</surname></persName>
							<email>madalina.raschip@uaic.ro</email>
							<affiliation key="aff0">
								<orgName type="institution">Alexandru Ioan Cuza University of Iaşi</orgName>
								<address>
									<addrLine>Bulevardul Carol I, Nr.11</addrLine>
									<postCode>700506</postCode>
									<settlement>Iaşi</settlement>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">The Application of Sequential Pattern Mining Techniques on MIMIC-IV</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">56E10EF46D5800F790056F9C002925B6</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T01:38+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>sequential pattern mining</term>
					<term>next prescribed drug</term>
					<term>MIMIC-IV</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The paper studies the application of sequential pattern mining techniques to medical data from MIMIC-IV, a large healthcare dataset. Sequences of prescribed drugs to a large number of patients are analyzed in order to find out if there are patterns or temporal relationships which are general or specific to a particular disease. The PrefixSpan and Spade algorithms were applied to mine sequential patterns on all sequences or on a subset of them. The extracted patterns could be used to suggest the next prescribed drug. The experimental results show that the predictions obtained have a good accuracy for some diagnoses.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The correct use of a drug is dependent upon several conditions. Each drug has some characteristics, such as indications, possible risk factors and contraindications, like the use with other drugs or the existence of certain medical conditions. The improper use of drugs and self-medication can be dangerous <ref type="bibr" target="#b0">[1]</ref>.</p><p>The advancement of technology has made it possible to digitally collect and store patient data for their subsequent use. The manipulation of this large amount of data could bring new knowledge to the medical field <ref type="bibr" target="#b1">[2]</ref>. Medications prescribed by specialists can be used to identify the optimal treatment. The order of the prescriptions could provide important information. Frequent subsequences or predictions of the next drug can help a doctor in making a quick decision when there are too many medication options. They can be used to make automatic recommendations in routine cases, or to verify the correctness of unusual orders.</p><p>Sequential pattern mining can be a solution to this problem because it can identify patterns of ordered events <ref type="bibr" target="#b2">[3]</ref>. A survey of the approaches proposed for sequential pattern mining is given in <ref type="bibr" target="#b3">[4]</ref>, <ref type="bibr" target="#b12">[13]</ref>. Sequential pattern mining was applied in different areas of research, also including the medical domain. For example, to identify temporal relationships between drug prescription and medical events or between prescriptions of different drugs <ref type="bibr" target="#b4">[5]</ref>, or to identify if a person is susceptible to a future illness <ref type="bibr" target="#b5">[6]</ref>.</p><p>In this paper, we used sequential pattern mining to predict the next medication for a patient. Other existing studies in the literature are based on machine-learning methods. In <ref type="bibr" target="#b7">[8]</ref>, the prescription data is transformed into a stochastic time series for prediction. Various machine-learning approaches were used and analyzed in order to predict prescription patterns. A different approach is presented in <ref type="bibr" target="#b8">[9]</ref>. The authors used neural networks and word2vec representations to predict the medication order prescribed during hospitalization, which could be used to assist pharmacists. Good results were obtained for obstetrics and gynecology patients and newborn babies. The paper <ref type="bibr" target="#b9">[10]</ref> predicts prescriptions for the next period of time based on the disease status, laboratory results and the previous treatment of the patient through a framework of machine learning. The authors used three Long Short-Term Memory models. The experiments were performed on data from the MIMIC-III ICU and other data from hospitals in China. The results obtained reveal the effectiveness of the methods. Another study <ref type="bibr" target="#b10">[11]</ref> uses probabilistic topic modelling to predict clinical order patterns.</p><p>A similar study to ours is presented in <ref type="bibr" target="#b6">[7]</ref>. The authors describe an approach based on sequential pattern mining to identify the next prescribed medication for patients with diabetes. The CSPADE algorithm is used to mine sequential patterns at the drug class and generic drug level. The dataset used in our research is different from the one considered in <ref type="bibr" target="#b6">[7]</ref>. We used a larger real-world dataset, MIMIC-IV, on which sequential pattern mining has not been applied before. The preprocessing step of identification of drugs and the construction of sequences are specific to this dataset. Two mining algorithms, PrefixSpan and SPADE, were considered. Although the predictions are made in a similar way by constructing some rules from the frequent patterns, the analysis of the mining algorithms on the MIMIC-IV dataset and the evaluation of the results on several diagnoses such as "heart attack" are two other elements that distinguish the current paper from the existing works.</p><p>The paper is organized as follows. A formal description of the problem of mining sequential patterns and the algorithms used to solve the problem is given in Section 2. In Section 3 we present the dataset used and in Section 4 the experimental settings and results. We conclude with a summary and future improvements in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Sequential Pattern Mining</head><p>The problem that sequential pattern mining is trying to solve can be described as follows: knowing that many events occur in time, can we learn more about this data if we analyse any ordered sequence encountered? <ref type="bibr" target="#b12">[13]</ref> In the following we formally describe the problem. Let 𝐼 = {𝑖 1 , 𝑖 2 , … , 𝑖 𝑛 } be a set of elements, also called an alphabet. An event (𝑖 𝑥 1 , 𝑖 𝑥 2 , … , 𝑖 𝑥 𝑘 ), 1 ≤ 𝑥 𝑗 ≤ 𝑛, ∀ 𝑗 ∈ {1, … , 𝑘} is a nonempty subset of 𝐼 and an unordered collection of elements. A sequence 〈𝑒 1 , 𝑒 2 , … , 𝑒 𝑞 〉 is an ordered collection of events. A sequence that contains k elements is known as a k-sequence. A sequence</p><formula xml:id="formula_0">𝑠 𝑒 = 〈𝑒 1 , 𝑒 2 , … , 𝑒 𝑛 〉 is a subsequence of the sequence 𝑠 𝑓 = 〈𝑓 1 , 𝑓 2 , … , 𝑓 𝑚 〉 if there exist integers 1 ≤ 𝑖 1 &lt; 𝑖 2 &lt; ⋯ &lt; 𝑖 𝑛 ≤ 𝑚 such that 𝑒 1 ⊆ 𝑓 𝑖 1 , 𝑒 2 ⊆ 𝑓 𝑖 2 , … , 𝑒 𝑛 ⊆ 𝑓 𝑖 𝑛 .</formula><p>A sequence database is a set of sequences that have associated identifiers. The support of a sequence s, denoted sup(s), in a sequence database represents the number of sequences containing s, i.e., for which s is a subsequence. Giving a value for the minimum support, denoted 𝑚𝑖𝑛𝑠𝑢𝑝, a sequence is considered frequent in a database if its support is at least equal to the 𝑚𝑖𝑛𝑠𝑢𝑝. Sequential pattern mining aims to find these frequent sequences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">SPADE</head><p>SPADE (Sequential PAttern Discovery using Equivalence classes) <ref type="bibr" target="#b13">[14]</ref> is an Apriori-based algorithm, making use of the Apriori property that claims that any subsequence of a frequent sequence is also a frequent sequence. SPADE works with data organized in vertical format, by transforming the initial sequence database into a table composed of all events where a row is an event linked with the corresponding sequence identifier (SID) and its position in the sequence (EID).</p><p>At each step k, the algorithm searches for k-sequences that have the chance to be frequent, by generating id-lists. The first step is to find the 1-frequent sequences. Support is calculated for each element of the alphabet, counting the entries in the vertical formatted table that contains it. Those entries will be included in its id-list. Subsequently, only items that reach the minimum support are frequent and will be considered for finding 2-frequent sequences. In the general case, candidate ksequences are found by joining the id-lists of any two frequent (k-1)-sequences, that have the same SID and have ordered sequential positions (EIDs). The algorithm stops when no more frequent sequences have been found or no more candidate sequences have been constructed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">PrefixSpan</head><p>PrefixSpan (Prefix-Projected Sequential Patterns Mining) <ref type="bibr" target="#b14">[15]</ref> is a Pattern-Growth-based algorithm, because it does not generate candidate sequences, but instead uses partitioning of the data set into projections, which will be explored separately to extend the already known frequent sequences.</p><p>The PrefixSpan algorithm includes the following steps:</p><p>1. Find 1-frequent sequences in the dataset that will later be concatenated to the current frequent sequence (or the current frequent prefix) to form new frequent sequences. Initially, the current frequent sequence is an empty sequence, 𝑠 = 〈_〉 . 2. The search space is partitioned according to the sequences found in the previous step. For each new, frequent sequence obtained, a projection is created, considering that sequence as a prefix.</p><p>3. For each projection, look for the elements with support at least equal to 𝑚𝑖𝑛𝑠𝑢𝑝 which will be used to extend the previous frequent sequences. These steps are repeated recursively, the algorithm operating on a divide et impera strategy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">MIMIC-IV dataset</head><p>MIMIC (Medical Information Mart for Intensive Care) is a relational database, publicly accessible which documents the hospitalizations of patients at Beth Israel Deaconess Medical Center (BIDMC) in Boston, MA, USA. MIMIC-IV <ref type="bibr" target="#b11">[12]</ref> is the latest version of the MIMIC database and represents an improvement of MIMIC-III, with a modular structure and more recent patient data from 2008 to 2019. MIMIC-IV contains five modules that reflect the origin of the data: core, hosp, icu, ed and cxr. We used the hosp module which provides information from the electronic medical records that include laboratory tests, medications, and diagnoses. From this module, the following tables were used: prescriptions, diagnoses_icd and d_icd_diagnoses. The prescriptions table contains information about the prescribed medications. The drug type field has three possible values: MAIN, BASE, or ADDITIVE. The diagnoses_icd table records the diagnoses for which a patient was billed. Each diagnosis has associated a seq_num which represents the importance of the diagnosis. The lower the seq_num is, the more significant the diagnosis is. The official name of a diagnosis can be identified using the table d_icd_ diagnoses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 1: Distribution of the number of drugs per hospitalization</head><p>The prescriptions table contains 17008053 records, i.e., drugs that were individually prescribed. In most prescriptions, the drug type was in the MAIN category. Prescriptions were made for 232064 patients, with 452115 hospitalizations. A distribution of the number of drugs per hospitalization is available in Figure <ref type="figure">1</ref>. In most cases, this number falls in the range [0,400], although there are also much higher values (a maximum of 2156).</p><p>There are 5280351 diagnoses in the associated table diagnoses_icd, established for 255106 patients who had 521111 hospitalizations. A patient may have several hospitalizations, and for each hospitalization, several diagnoses. The distribution of the number of diagnoses per hospitalization is given in Figure <ref type="figure" target="#fig_0">2</ref>. The d_icd_diagnoses table contains 109775 lines, or possible diagnoses. Table <ref type="table" target="#tab_0">1</ref> shows the ranking of the most common diagnoses. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental results</head><p>This section describes the steps followed in generating predictions using Sequential Pattern Mining algorithms on the MIMIC-IV dataset, as shown in Figure <ref type="figure" target="#fig_1">3</ref>. The steps are the following: finding the list of distinct drugs, filtering hospitalizations by diagnoses, building sequences of drugs, running sequential pattern algorithms and building rules. The cases where the predictions are relevant and the parameters that influence their accuracy are analyzed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Preprocessing</head><p>The same drug may appear in prescriptions in several forms, such as various abbreviations ('hepa', 'hepar', 'hepari', 'heparin'), some of the letters are capitalized ('acetaZOLAMIDE', 'Acetazolamide', 'AcetaZOLamide'), more or less spaces and special characters ('Dextromethorphan-', 'Dextromethorphan'), additional words, such as 'pain', 'bulk', 'extended release' ('vancomycin', 'vancomycin (bulk)'). Another, more complex problem, is that medicines may appear under completely different names, i.e. with the generic name, or with the name used by the brand. A solution to all these inconsistencies is the usage of the gsn field, which contains one or more 6 digit Generic Sequence Number (GSN) codes. GSN identifies a product based on its formula, dose, method of administration and concentration and can be used to group generally equivalent products, which may differ only through the manufacturer. In order to reduce the existence of several equivalent elements, we created a list of drugs with a unique id associated with the help of the GSN codes. Since a drug or other equivalent drugs can be associated with several GSN codes, groups of GSN codes will be established so that one group contains all codes that have been mentioned together directly or indirectly. Two drugs will be considered equivalent if at least one of their GSN codes (not necessarily identical) is found in the same group of GSN codes. Thus, starting from a list of 16970 pairs (drug, gsn), we obtained a list of 3398 drugs with a unique id after preprocessing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">The construction of sequences</head><p>A sequence is an ordered list of events of the form 〈𝑒 1 , 𝑒 2 , … , 𝑒 𝑛 〉 and initially the events are empty subsets of the alphabet I. In our case, the alphabet is the set of drugs ids 𝐼 = {0,1,2, … ,3397}. A sequence corresponds to a hospitalization and is represented by the list of ids of the drugs prescribed, grouped and sorted by time. For example, the sequence 〈(2624), (2624), (2769, 539, 1100)〉 specifies that in the case of a hospitalization, the drug with id 2624 was prescribed first, then again, the same drug, and then followed by a group of three drugs.</p><p>We considered two cases for the generation of sequences: the sequences are built for all hospitalizations, or only on a subset of hospitalizations. In the first case, the list of distinct hospitalizations that have at least one prescription can be easily found by querying the prescriptions table. For each element of the list, the events of the corresponding sequence are considered. Given that there are 452 115 distinct hospitalizations, the number of generated sequences is high, fact which limits the competence of mining algorithms. Consequently, for the second case, we considered filtering the hospitalizations after one or more diagnoses. Given a set of keywords, we will search for hospitalizations that have diagnoses that contain all the keywords. For example, for the words 'heart' and 'pneumonia', hospitalization with the following diagnoses 'Pneumonia due to adenovirus', 'Aneurysm of heart', 'Other and unspecified hyperlipidemia' will be selected. In addition to this filtering, when constructing sequences, only prescriptions with a drug type equal to MAIN will be considered.</p><p>This filtering is meant to facilitate the use of fewer resources (time and memory) by algorithms and to obtain better results, because the selection of hospitalizations by diagnoses can increase the chance of finding more common patterns.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Sequence pattern mining</head><p>The frequent sequences of prescribed drugs were extracted using two sequential pattern mining algorithms, SPADE and PrefixSpan, available in the open-source Java library SPMF <ref type="bibr" target="#b15">[16]</ref>. We run the algorithms on an instance based on Windows 10 Pro that has an Intel(R) Core (TM) i7-8550U CPU @ 1.80GHz processor with 8 GiB of memory. SPADE cannot be applied to the entire dataset due to the additional memory the algorithm requires to transform the sequence database into a vertical format. SPADE is suitable to be used for a subset of hospitalizations. The results of SPADE are given in Table <ref type="table" target="#tab_1">2</ref>. We considered six use cases, i.e., hospitalizations that had the following diagnoses: Heart failure, Born in hospital, Acute kidney failure, Need for prophylactic vaccination and inoculation against viral hepatitis, circumcision and Encounter for immunization. We selected the hospitalizations for a diagnosis based on some terms. The chosen terms are contained in or represent the names for the most common diagnoses. We consider diagnoses with 𝑠𝑒𝑞_𝑛𝑢𝑚 ≥ 5 because of the higher chance that they will be the main reason for the hospitalization. For example, there are 73 different diagnoses containing the term 'heart failure' and for which this diagnosis is important (𝑠𝑒𝑞_𝑛𝑢𝑚 ≥ 5). One of the most common diagnoses is Congestive heart failure, unspecified, according to Error! Reference source not found.. The number of resulted sequences is 52086, with an average of 21,04 events. A number of 80983 frequent sequences were found by applying the algorithm on the sequences. The selected values for the minimum support are specified in Table <ref type="table" target="#tab_1">2</ref>. The value of 𝑚𝑖𝑛𝑠𝑢𝑝 is empirically chosen for practical time limits. A lower number of events means that fewer medications are prescribed for those hospitalizations, which allows the choice of a lower minimum support.</p><p>PrefixSpan The parameters of the algorithm are the value of the minimum support and, optionally, the maximum length of the sequences.</p><p>We tested the algorithm for all hospitalizations, using a minimum support of 0.025 (10891 sequences) and a maximum length of a sequence of 20. The number of frequently found sequences is equal to 9771. Some of the most frequently used drugs are: lactated ringers (Ringer's Lactate Solution), hydralazine (Hydralazine), tylenol (Paracetamol), 0.9% sodium chloride, potassium chloride, heparin flush, etc.</p><p>A small decrease in the minimum support can significantly increase the number of sequences and thus the execution time. For example, for a minimum support of 0.02 (8713 sequences), the number of frequent sequences increases to 20968.</p><p>Next, we repeated the tests made with SPADE, but using the PrefixSpan algorithm instead. The results are given in Table <ref type="table" target="#tab_2">3</ref>. PrefixSpan finds more frequent sequences and uses less memory than SPADE. However, the execution time is significantly shorter for SPADE when we have long sequences, and shorter for PrefixSpan in case of short sequences.</p><p>We next analyzed the frequent sequences that resulted from the application of the algorithms. We considered two cases: with a high minimum support (for example, hospitalizations with heart failure diagnosis) and with a low minimum support (for example, hospitalizations with Need for prophylactic vaccination and inoculation against viral hepatitis diagnosis). Some frequent sequences found for hospitalizations with Need for prophylactic vaccination and inoculation against viral hepatitis diagnosis are given in Table <ref type="table">4</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 4 Sequential patterns</head><p>Sequential patterns Support We analyzed the sequential patterns in order to identify the most commonly used medications. For heart failure diagnoses, the most common drugs among the frequent sequences are: tylenol, senna laxative, aspirin, docusate sodium, dextrose, furosemide, metoprolol tartrate, glucagon, etc. Most of these drugs are also common in all hospitalizations, making difficult to say whether they are specific to these types of hospitalizations or not. Consequently, we manually searched for drugs known to be common for the treatment of heart failure<ref type="foot" target="#foot_1">2</ref> . We give next some of the results:</p><p>• from the class Angiotensin-Converting Enzyme (ACE) Inhibitors: lisinopril is often found, being contained in over 400 sequences, captopropyl is found in 6 sequences, with support in the range [1000-2200] • from the Beta Blockers class: carvedilol appears in over 100 sequences; metoprolol is one of the most frequently found drug • from the class of Vasodilators: hydralazine is found in many different forms, nitroglycerin is found in over 500 sequences • from the class of Diuretics: furosemide is one of the most common drugs, torsemide is found in over 500 sequences, metallozone is found only individually For patients diagnosed with Need for prophylactic vaccination and inoculation against viral hepatitis, the drugs prescribed are less varied, most of them being hepatitis b immune globulin (bayhep b), hepatitis b vaccine, vitamin k, gentamicin, erythromycin ophthalmic, tylenol, heparin, triple dye. In addition to the vaccine itself (current diagnosis indicates the need for hepatitis vaccination), usual drugs are found, or drugs specific to newborns, because the hepatitis B vaccine is administered to them immediately after birth.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Make predictions using frequent sequences</head><p>As we previously specified, the frequent sequences can be utilised to identify drugs used for different diagnoses. But when the number of sequences is huge, this approach becomes less relevant and time expensive. Frequent drug sequences can reveal which drugs or combinations of drugs are more likely to be recommended when we know the previous prescriptions. We will predict the most likely drugs to be prescribed and compare the result with the real values to determine the accuracy of the predictions.</p><p>Rules construction To describe the links between the drugs from frequent sequences, we will generate rules of form (antecedent, consequent, support) with the following meanings:</p><p>• For a sequence 𝑠 = 〈𝑒 1 , 𝑒 2 , … , 𝑒 𝑛 〉, the antecedent will contain the first (n−1) events 〈𝑒 1 , 𝑒 2 , … , 𝑒 𝑛−1 〉, and the consequent will be the last event 𝑒 𝑛 . Only sequences containing at least two elements are considered.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>The support of the rule will correspond to the sequence support. Some examples of rules generated from frequent sequences for Need for pro phylactic vaccination and inoculation against viral hepatitis diagnosis are given in Table <ref type="table">5</ref>. contained in the predicted events is found in the event set aside, then we will consider the prediction is correct. Accuracy is computed as the percentage of correct predictions out of the total predictions made.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Predictions results</head><p>The prediction results are given in Table <ref type="table" target="#tab_3">6</ref>. For the heart failure diagnosis, for example, for 5718 sequences at least one correct prediction was obtained, meaning an accuracy of 25.83%, and for 4704 we could not find any prediction. If we take into account only the sequences on which predictions were found, then the accuracy would be 32.79%.</p><p>For certain diagnostics, like Need for prophylactic vaccination and inoculation against viral hepatitis and Circumcision the accuracy is high, while for other diagnoses like Heart failure it is small. Statistically, the number of prescriptions increases with age <ref type="bibr" target="#b16">[17]</ref>. Intuitively, a diagnosis that contains the term 'born in hospital' refers to newborns, in which case certain standard medicines are required. The number of allowed drugs is lower (many drugs have age restrictions). In this case, it is easier to identify which drugs are more likely to be prescribed. The diagnoses Need for prophylactic vaccination and inoculation against viral hepatitis and Encounter for immunization indicate that a person needs administration of a vaccine. The person is not necessarily ill, so the number of drugs is not expected to be high. Instead, diagnoses that contain 'heart failure' indicate a serious, complex condition that is often found in the population over the age of 65.</p><p>To better clarify the possible reasons that affect the accuracy of predictions, we analysed other measures detailed in Table <ref type="table" target="#tab_4">7</ref>. The second column contains the total number of different drugs encountered in the sequences. The next column contains the average number of drugs per sequence. The last column contains the average difference between the date of the last prescription and the date of the first prescription. According to Table <ref type="table" target="#tab_4">7</ref>, when there is a wider range of drugs to choose from, the accuracy tends to decrease. The average number of drugs per sequence influences the sequential pattern mining algorithms: it is necessary to usually choose a larger support, so as not to use too much memory, fact which also influences the accuracy. Another parameter that could influence the results is the length of the period in which prescriptions were made. This may indicate complex diagnoses or, conversely, less severe cases.</p><p>The choice of the minimum support can influence the accuracy of the predictions, and indirectly the runtime and the memory. Table <ref type="table" target="#tab_5">8</ref> exemplifies the way the support influences the accuracy. The last column is the time needed to compute the predictions. As the minimum support decreases, the accuracy increases slightly, and the runtime also increases. Lowering the support is useful up to a certain limit, for which a reasonable execution time is obtained</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>Sequential Pattern Mining represents an effective technique to make predictions of medications based on the patient's past prescription history. This paper studies in particular the application of two algorithms, SPADE and PrefixSpan, as a means to find frequent sequences that reveal temporal relationships between medications. The resulting frequent sequences are general or specific to one or more diseases and are used to construct rules. Predictions are made by finding matches of a patient's medication history in the list of rules. According to the experimental results, there are situations in which the predictions made can reach a satisfactory accuracy. Such a solution is especially useful for routine cases, for instance, immunizations, or for the treatment of newborns. Instead, for more complex diagnoses, additional study is needed to optimize the results.</p><p>Some improvements that can be made are the addition and the usage of supplementary patient information, such as laboratory results, age and supplementary medication details, like the dose, the method of administration.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Distribution of the number of diagnoses per hospitalization</figDesc><graphic coords="4,182.00,415.38,248.00,179.69" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Pipeline for the strategy used</figDesc><graphic coords="5,72.00,568.97,468.00,124.95" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>'</head><label></label><figDesc>erythromycin ophthalmic' 27346 'erythromycin ophthalmic' 'phytonadione (vitamin k1)' → 'erythromycin ophthalmic' 'hepatitis b vaccine' 'phytonadione (vitamin k1)' 'phytonadione (vitamin k1)' 27347 'hepatitis b vaccine' 7876 'phytonadione (vitamin k1)' → 'gentamicin' 1324 'phytonadione (vitamin k1)' → 'acetaminophen' 2356 'lidocaine' 'acetaminophen' → 'hepatitis b vaccine' 189 'triple dye' → 'hepatitis b vaccine' 2525 'triple dye' 'erythromycin ophthalmic' 'hepatitis b immune globulin' 37 'erythromycin ophthalmic' 'phytonadione (vitamin k1)'→'phytonadione (vitamink1)' 7428</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Distribution of the number of diagnoses per hospitalization</figDesc><table><row><cell>No.</cell><cell>Diagnosis</cell><cell>No. of occurrences</cell></row><row><cell>1</cell><cell>Unspecified essential hypertension</cell><cell>104080</cell></row><row><cell>2</cell><cell>Other and unspecified hyperlipidemia</cell><cell></cell></row><row><cell>3</cell><cell>Essential (primary) hypertension</cell><cell></cell></row><row><cell>4</cell><cell>Hyperlipidemia, unspecified</cell><cell></cell></row><row><cell>5</cell><cell>Esophageal reflux</cell><cell></cell></row><row><cell>6</cell><cell>Diabetes mellitus without mention of complication, type II or</cell><cell></cell></row><row><cell></cell><cell>unspecified type, not stated as uncontrolled</cell><cell></cell></row><row><cell>7</cell><cell>Personal history of nicotine dependence</cell><cell></cell></row><row><cell>8</cell><cell>Atrial fibrillation</cell><cell></cell></row><row><cell>9</cell><cell>Depressive disorder, not elsewhere classified</cell><cell></cell></row><row><cell>10</cell><cell>Congestive heart failure, unspecified</cell><cell></cell></row><row><cell>11</cell><cell>Coronary atherosclerosis of native coronary artery</cell><cell></cell></row><row><cell>12</cell><cell>Gastro-esophageal reflux disease without esophagitis</cell><cell></cell></row><row><cell>13</cell><cell>Need for prophylactic vaccination and inoculation against viral</cell><cell></cell></row><row><cell></cell><cell>hepatitis</cell><cell></cell></row><row><cell>14</cell><cell>Personal history of tobacco use</cell><cell></cell></row><row><cell>15</cell><cell>Major depressive disorder, single episode, unspecified</cell><cell></cell></row><row><cell>16</cell><cell>Acute kidney failure, unspecified</cell><cell></cell></row><row><cell>17</cell><cell>Unspecified acquired hypothyroidism</cell><cell></cell></row><row><cell>18</cell><cell>Encounter for immunization</cell><cell></cell></row><row><cell>19</cell><cell>Atherosclerotic heart disease of native coronary artery without</cell><cell></cell></row><row><cell></cell><cell>angina pectoris</cell><cell></cell></row><row><cell>20</cell><cell>Tobacco use disorder</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 The</head><label>2</label><figDesc></figDesc><table><row><cell cols="2">results of SPADE</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Diagnosis</cell><cell cols="2">diagnoses 𝑚𝑖𝑛𝑠𝑢𝑝</cell><cell>sequences</cell><cell>Avg no.</cell><cell>frequent</cell><cell>Time</cell><cell>Memory</cell></row><row><cell></cell><cell></cell><cell></cell><cell>(hospitalizations)</cell><cell>of</cell><cell>sequences</cell><cell>(s)</cell><cell>(mb)</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>events</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Heart failure</cell><cell>73</cell><cell>0.025</cell><cell>52086</cell><cell>21.04</cell><cell>80983</cell><cell cols="2">86.94 1556.81</cell></row><row><cell>Born in</cell><cell>21</cell><cell>0.008</cell><cell>37113</cell><cell>2.77</cell><cell>44020</cell><cell cols="2">41.21 407.844</cell></row><row><cell>hospital</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Acute kidney</cell><cell>15</cell><cell>0.025</cell><cell>48255</cell><cell>24.92</cell><cell>68860</cell><cell cols="2">162.21 906.58</cell></row><row><cell>failure</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Need for</cell><cell>1</cell><cell>0.0001</cell><cell>29177</cell><cell>1.44</cell><cell>96546</cell><cell cols="2">44.98 329.158</cell></row><row><cell>prophylactic</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>vaccination</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>and</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>inoculation</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>against viral</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>hepatitis</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Circumcision</cell><cell>2</cell><cell>0.0001</cell><cell>13269</cell><cell>2.05</cell><cell>262264</cell><cell cols="2">45.61 329.158</cell></row><row><cell>Encounter for</cell><cell>1</cell><cell>0.0011</cell><cell>20149</cell><cell>1.88</cell><cell>73801</cell><cell>51.04</cell><cell>415.74</cell></row><row><cell>immunization</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 The</head><label>3</label><figDesc></figDesc><table><row><cell>results of PrefixSpan</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Diagnosis</cell><cell>𝑚𝑖𝑛𝑠𝑢𝑝</cell><cell>Frequent sequences</cell><cell>Time (s)</cell><cell>Memory (mb)</cell></row><row><cell>Heart failure</cell><cell>0.025</cell><cell>101517</cell><cell>319</cell><cell>662.53</cell></row><row><cell>Born in hospital</cell><cell>0.008</cell><cell>44601</cell><cell>29.59</cell><cell>241.61</cell></row><row><cell>Acute kidney failure</cell><cell>0.025</cell><cell>86110</cell><cell>726.23</cell><cell>555.51</cell></row><row><cell>Need for</cell><cell>0.0001</cell><cell>99975</cell><cell>0.77</cell><cell>100.57</cell></row><row><cell>prophylactic</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>vaccination and</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>inoculation against</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>viral hepatitis</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Circumcision</cell><cell>0.0001</cell><cell>268516</cell><cell>1.68</cell><cell>95.39</cell></row><row><cell>Encounter for</cell><cell>0.0011</cell><cell>93614</cell><cell>6.4</cell><cell>114.12</cell></row><row><cell>immunization</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 6</head><label>6</label><figDesc></figDesc><table><row><cell>Predictions results</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Diagnosis</cell><cell>Algorithm</cell><cell>Support</cell><cell>𝑝 𝑚𝑎𝑥</cell><cell cols="4">Sequences Segments Accuracy Runtime</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>(sec)</cell></row><row><cell>Heart failure</cell><cell>SPADE</cell><cell>0.025</cell><cell>12</cell><cell>25%</cell><cell>22139</cell><cell cols="2">25.83% 2914.35</cell></row><row><cell>Born in</cell><cell>SPADE</cell><cell>0.008</cell><cell>8</cell><cell>100%</cell><cell>18165</cell><cell>65.15%</cell><cell>858.18</cell></row><row><cell>hospital</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Need for</cell><cell>PrefixSpan</cell><cell>0.0001</cell><cell>13</cell><cell>100%</cell><cell>9962</cell><cell>80.14%</cell><cell>106.96</cell></row><row><cell>prophylactic</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>vaccination</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>and</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>inoculation</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>against viral</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>hepatitis</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Circumcision</cell><cell>PrefixSpan</cell><cell>0.0001</cell><cell>13</cell><cell>100%</cell><cell>10854</cell><cell>89.32%</cell><cell>37.07</cell></row><row><cell>Encounter for</cell><cell>PrefixSpan</cell><cell>0.0011</cell><cell>7</cell><cell>50%</cell><cell>3965</cell><cell>55.88%</cell><cell>266.22</cell></row><row><cell>immunization</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 7</head><label>7</label><figDesc>Measures that can influence the accuracy of predictions</figDesc><table><row><cell>Diagnosis</cell><cell>Total drugs</cell><cell>Drugs per sequence</cell><cell>Time diff (days)</cell></row><row><cell>Heart failure</cell><cell>2155</cell><cell>45.82</cell><cell>6</cell></row><row><cell>Born in hospital</cell><cell>305</cell><cell>4.92</cell><cell>3</cell></row><row><cell>Need for prophylactic</cell><cell>200</cell><cell>2.96</cell><cell>1</cell></row><row><cell>vaccination and</cell><cell></cell><cell></cell><cell></cell></row><row><cell>inoculation against viral</cell><cell></cell><cell></cell><cell></cell></row><row><cell>hepatitis</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Circumcision</cell><cell>112</cell><cell>3.76</cell><cell>1</cell></row><row><cell>Encounter for</cell><cell>864</cell><cell>3.84</cell><cell>1</cell></row><row><cell>immunization</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 8</head><label>8</label><figDesc>The influence of support on the accuracy of predictions</figDesc><table><row><cell>Diagnosis</cell><cell>Support</cell><cell>Frequent sequences</cell><cell>Accuracy</cell><cell>Time (sec)</cell></row><row><cell>Circumcision</cell><cell>0.01</cell><cell>246</cell><cell>85.54%</cell><cell>0.39</cell></row><row><cell></cell><cell>0.001</cell><cell>2505</cell><cell>88.50%</cell><cell>1.38</cell></row><row><cell></cell><cell>0.0001</cell><cell>268516</cell><cell>89.32%</cell><cell>37.07</cell></row><row><cell>Need for</cell><cell>0.01</cell><cell>132</cell><cell>71.36%</cell><cell>0.12</cell></row><row><cell>prophylactic</cell><cell>0.001</cell><cell>1413</cell><cell>77.13%</cell><cell>0.45</cell></row><row><cell>vaccination and</cell><cell>0.0001</cell><cell>99975</cell><cell>80.14%</cell><cell>6.97</cell></row><row><cell>inoculation against</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>viral hepatitis</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">'triple dye' 'erythromycin ophthalmic' 'hepatitis b immune globulin'</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://www.nhs.uk/conditions/heart-failure/treatment/, https://www.heart.org/en/health-topics/heart-failure/treatment-options-for-heart-failure/medications-used-to-treat-heart-failure</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Predictions using rules Before making any predictions, the list of rules is sorted using a multilevel approach: first, descending by the number of events from the antecedent and then descending by support. To narrow the search space, we also created a threshold dictionary as follows: for each length of the antecedent that exists in the previously sorted list, store the index of the first corresponding rule. For example, the following threshold dictionary, denoted 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑𝑠 = {8: 0, 7: 97, 6: 1178, 5: 6174, 4: 17020, 3: 29901, 2: 39094, 1: 43006, 0: 43908} reveals that there are eight distinct lengths of the rules' antecedent. The rules that have an antecedent containing x events, 1 ≤ 𝑥 ≤ 8, will be found in the list starting with position 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑𝑠[𝑥] and up to position 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑𝑠[𝑥 − 1] − 1.</p><p>Having a patient's prescribed medication sequence during a hospitalization 𝑠 = 〈𝑒 1 , 𝑒 2 , … , 𝑒 𝑛 〉 and a sorted list of rules, the predictions will be made as follows:</p><p>1. If 𝑛 ≥ 1, iterate through the rules with the number of events from the antecedent equal to the number of events in s. The threshold list will be used. 2. For each rule, check if there is a match between the antecedent and the sequence s. If a match is found, the event from the consequent is added to a list. 3. If five matches are found, the search ends. Otherwise, the first event from the sequence s is removed and the previous steps are repeated. Deleting the first item from s means, in fact, that we are trying to test on the patient's more recent history. At the end, the list of maximum five events represents the predictions of drugs for the patient with the sequence s as history.</p><p>To test the accuracy of the predictions, we used the hospitalizations for which the frequent sequences were found and which, implicitly, were used to generate the rules and the dictionary of thresholds. Denote by 𝑝 𝑚𝑎𝑥 the maximum value of a key in the threshold dictionary, or the maximum length of an antecedent for the current rules. The sequences of each hospitalization are divided into segments of length 𝑝 𝑚𝑎𝑥 . If they are not divided exactly, the last segment will be considered if its length is at least two. The last event is removed from each segment, as it will be used to verify the correctness of the predictions. Predictions are made based on these segments, and if at least one of the drugs</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Self-medication with over-the-counter and prescribed drugs causing adverse-drugreaction-related hospital admissions: results of a prospective, long-term multi-centre study</title>
		<author>
			<persName><forename type="first">S</forename><surname>Schmiedl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rottenkolber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hasford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Rottenkolber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Farker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Drewelow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Thürmann</surname></persName>
		</author>
		<idno type="DOI">10.1007/s40264-014-0141-3</idno>
	</analytic>
	<monogr>
		<title level="j">Drug safety</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="225" to="235" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Medical data mining: knowledge discovery in a clinical data warehouse Proceedings: a conference of the American Medical Informatics Association</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Prather</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">F</forename><surname>Lobach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">K</forename><surname>Goodwin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Hales</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Hage</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">E</forename><surname>Hammond</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AMIA Fall Symposium</title>
				<imprint>
			<date type="published" when="1997">1997</date>
			<biblScope unit="page" from="101" to="105" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Mining sequential patterns</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Srikant</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eleventh International Conference on Data Engineering</title>
				<meeting>the Eleventh International Conference on Data Engineering</meeting>
		<imprint>
			<date type="published" when="1995">1995</date>
			<biblScope unit="page" from="3" to="14" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A survey of sequential pattern mining</title>
		<author>
			<persName><forename type="first">P</forename><surname>Fournier-Viger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C W</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">U</forename><surname>Kiran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">S</forename><surname>Koh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Thomas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Data Science and Pattern Recognition</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="54" to="77" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Temporal pattern discovery for trends and transient effects: its application to patient records</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">N</forename><surname>Norén</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bate</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hopstadius</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Star</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">R</forename><surname>Edwards</surname></persName>
		</author>
		<idno type="DOI">10.1145/1401890.1402005</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining</title>
				<meeting>the 14th ACM SIGKDD international conference on Knowledge discovery and data mining</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="963" to="971" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Discovering sequential patterns in a UK general practice database</title>
		<author>
			<persName><forename type="first">J</forename><surname>Reps</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Garibaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Aickelin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Soria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Gibson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">B</forename><surname>Hubbard</surname></persName>
		</author>
		<idno type="DOI">10.1109/BHI.2012.6211748</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of 2012 IEEE-EMBS International Conference on Biomedical and Health Informatics</title>
				<meeting>2012 IEEE-EMBS International Conference on Biomedical and Health Informatics</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="960" to="963" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The use of sequential pattern mining to predict next prescribed medications</title>
		<author>
			<persName><forename type="first">A</forename><surname>Wright</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Wright</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mccoy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Sittig</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.jbi.2014.09.003</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of biomedical informatics</title>
		<imprint>
			<biblScope unit="volume">53</biblScope>
			<biblScope unit="page" from="73" to="80" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Predicting prescription patterns</title>
		<author>
			<persName><forename type="first">Í</forename><forename type="middle">S</forename><surname>Helgason</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
		<respStmt>
			<orgName>Massachusetts Institute of Technology)</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Doctoral dissertation</note>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">An application of machine learning to assist medication order review by pharmacists in a health care center</title>
		<author>
			<persName><forename type="first">M</forename><surname>Thibault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lebel</surname></persName>
		</author>
		<idno type="DOI">10.1101/19013029</idno>
		<ptr target="https://doi.org/10.1101/19013029" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A treatment engine by predicting next-period prescriptions</title>
		<author>
			<persName><forename type="first">B</forename><surname>Jin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Qu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tong</surname></persName>
		</author>
		<idno type="DOI">10.1145/3219819.3220095</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th ACM SIGKDD Inter-national Conference on Knowledge Discovery &amp; Data Mining</title>
				<meeting>the 24th ACM SIGKDD Inter-national Conference on Knowledge Discovery &amp; Data Mining</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1608" to="1616" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Predicting inpatient clinical order patterns with probabilistic topic models vs conventional order sets</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Goldstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Asch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Mackey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">B</forename><surname>Altman</surname></persName>
		</author>
		<idno type="DOI">10.1093/jamia/ocw136</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Medical Informatics Association</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="472" to="480" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bulgarelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Pollard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Horng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A</forename><surname>Celi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mark</surname></persName>
		</author>
		<title level="m">MIMIC-IV (version 1.0</title>
				<imprint>
			<publisher>PhysioNet</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Sequential pattern mining-approaches and algorithms</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>Mooney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">F</forename><surname>Roddick</surname></persName>
		</author>
		<idno type="DOI">10.1145/2431211.2431218</idno>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys (CSUR)</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="1" to="39" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">An efficient algorithm for mining frequent sequences</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Zaki</surname></persName>
		</author>
		<author>
			<persName><surname>Spade</surname></persName>
		</author>
		<idno type="DOI">10.1023/A:1007652502315</idno>
	</analytic>
	<monogr>
		<title level="j">Machine learning</title>
		<imprint>
			<biblScope unit="volume">42</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="31" to="60" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Mining sequential patterns by pattern-growth: The prefixspan approach</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mortazavi-Asl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Pinto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Hsu</surname></persName>
		</author>
		<idno type="DOI">10.1109/TKDE.2004.77</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on knowledge and data engineering</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="1424" to="1440" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">The SPMF open-source data mining library version 2</title>
		<author>
			<persName><forename type="first">P</forename><surname>Fournier-Viger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C W</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gomariz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gueniche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Soltani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">T</forename><surname>Lam</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Joint European conference on machine learning and knowledge discovery in databases</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="36" to="40" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Prescription drug use in the United States</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">B</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">M</forename><surname>Hales</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Ogden</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2016</title>
				<imprint>
			<date type="published" when="2015">2015-. 2019</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
