Evaluating espresso coffee quality by means of time-series feature engineering Daniele Apiletti, Eliana Pastor, Riccardo Callà, Elena Baralis Department of Control and Computer Engineering, Politecnico di Torino, Italy name.surname@polito.it ABSTRACT settings, i.e., the weight of coffee grounds and how fine it is Espresso quality attracts the interest of many stakeholders: from ground; (iii) the espresso machine, with professional machine consumers to local business activities, from coffee-machine ven- makers improving such technology over and over to promise the dors to international coffee industries. So far, it has been mostly perfect espresso all the time; (iv) the barista, i.e., the human-in- addressed by means of human experts, electronic noses, and chem- the-loop preparing the espresso in the bar, from blend choice, ical approaches. The current work, instead, proposes a data- to manual grinder settings, and to proper usage of the coffee driven analysis exploiting time-series feature engineering. We an- machine and its brewing procedure. alyze a real-world dataset of espresso brewing by professional In the current work, among the different quality-influencing coffee-making machines. The novelty of the proposed work is variables, we focus on (i) coffee ground size, (ii) ground amount, provided by the focus on the brewing time series, from which we and (iii) water pressure. Regarding the quality-evaluation vari- propose to engineer features able to improve previous data-driven ables, we exploit the following common metrics as selected by metrics determining the quality of the espresso. Thanks to the domain experts and related works: (i) total extraction time, (ii) exploitation of the proposed features, better quality-evaluation the total volume of coffee in cup, and (iii) the derived average predictions are achieved with respect to previous data-driven flow of the extraction [5]. approaches that relied solely on metrics describing each brewing The ideal portion [12] of ground coffee for each cup is declared as a whole (e.g., average flow, total amount of water). Yet, the to be 7 ± 0.5 g, while the water pressure should be 9 ± 1 bar, the engineered features are simple to compute and add a very limited extraction time 25 ± 5 s, and the volume in cup 25 ± 5 ml. workload to the coffee-machine sensor-data collection device, The coffee ground derives from the process of coffee grinding hence being suitable for large-scale IoT installations on-board from coffee beans. Small changes in the grind size can drastically of professional coffee machines, such as those typically installed affect the taste and the quality of the brewed espresso. In general, in consumer-oriented business activities, shops, and workplaces. if the coffee is ground too coarse, the espresso can be under- To the best of the authors’ knowledge, this is the first attempt to extracted and less flavorful. On the other hand, too fine ground perform a data-driven analysis of real-world espresso-brewing may result in an over-extracted and bitter coffee. The amount time series. Presented results yield to three-fold improvements of ground itself impacts on quality, resulting in a too watery in classification accuracy of high-quality espresso coffees with or bitter coffee. Water pressure must be set to brew the right respect to current data-driven approaches (from 30% to 100%), coffee amount in a proper time, thus leading to the right flow exploiting simple threshold-based quality evaluations, defined in rate determining an intense flavour. the newly proposed feature space. The novelty of the proposed work is provided by the exploita- tion of the brewing time series, from which we propose to engi- neer features able to improve the standard data-driven metrics 1 INTRODUCTION determining the quality of the espresso, i.e., extraction time, vol- Espresso is an almost syrupy beverage generated by a machine, ume, and flow (as the ratio of volume and time). The proposed typically using a motor-driven pump, forcing pressurized hot features are applied on a real-world dataset where we show that water through finely ground coffee. Each espresso shot in a bar they can provide better quality-evaluation predictions, by allow- can generate one or two cups of coffee, being called, respectively, ing to reduce the false positives, i.e., apparently good coffees, single or double, and requiring proportional amounts of ground without any loss in true positives. coffee. Since the engineered features are simple to compute and add a Drinking espresso coffee is a ritual rooted in the pleasure of very limited workload to the coffee-machine sensor-data collec- its taste. In some countries, such as Italy, where 97% of adults tion device, they are also suitable for large-scale IoT installations drink espresso daily [18], espresso quality is a main driver for on-board of professional coffee machines, such as those typically consumers’ habits and a primary focus of coffee industries. installed in consumer-oriented business activities, shops, and In 2018, each Italian had 2.2 daily espresso cups on average, workplaces. i.e., 6 kg yearly, in one of the 150 thousand bars, with each bar Presented results uncover insights into the espresso quality using 1.2 kg of ground coffee daily to serve almost 200 coffees on evaluation, its relationships with the main quality variables, lead- average, and most of them were espresso, representing approxi- ing to positive impacts on both coffee consumers and coffee- mately one third of a medium bar turnover [18]. making industries, respectively enjoying and providing more According to common knowledge and online sources [12, 18], pleasure in drinking higher-quality espresso coffee. such as the Italian Espresso National Institute, a perfect espresso The rest of the paper is structured as follows. Section 2 dis- depends on different variables: (i) the coffee blend, (ii) the grinder cusses related works, Section 3 describes the dataset and the ex- perimental design, Section 4 introduces the time-series feature en- © 2020 Copyright for this paper by its author(s). Published in the Workshop Proceed- ings of the EDBT/ICDT 2020 Joint Conference (March 30-April 2, 2020, Copenhagen, gineering algorithm, and Section 5 presents experimental results. Denmark) on CEUR-WS.org. Use permitted under Creative Commons License At- Finally, Section 6 draws conclusions and outlines future works. tribution 4.0 International (CC BY 4.0) 2 RELATED WORK feature engineering techniques into the espresso quality evalua- Espresso quality assessment is traditionally performed with sen- tion domain, leading to significant improvement in classification sory analysis, the scientific discipline that statistically and ex- performance with respect to the state of the art. To the best of the perimentally analyze reactions to stimuli perceived through the authors’ knowledge, this is the first attempt to perform a data- human senses (sight, smell, taste, touch and hearing). Sensory driven analysis of real-world espresso-brewing time series, as evaluation is however time-consuming and affected by subjec- until now the focus has been limited to whole-extraction metrics. tiveness and low-reproducibility due to the human component. Considering these limitations, objective analysis as chemical 3 DATASET DESCRIPTION techniques, electronic noises and data-driven approaces are com- The dataset under analysis consists of real-world espresso brew- monly exploited for coffee quality control. Different chemical ing data. Since the dataset is provided by a leading coffee com- techniques adopt Gas Chromatography (GC) and Mass Spec- pany, we cannot disclose exact details of the real-world settings troscopy (MS) analysis. Several works study the effect of external (e.g., the coffee-machine maker and model, the precise location variables (e.g. water pressure, water temperature) or of coffee and name of the involved business activities). Each espresso characteristics on the final espresso quality. Some works are fo- extraction has been performed on professional coffee-making cused on the influence of water, as its composition, pressure [1], machines and the values of the quality-evaluation variables have temperature [2] and of water pressure and temperature com- been collected every 300 ms. In particular, our time series consist bined [6]. Others studies instead consider the impact of coffee of the values of the amount of water at each time interval, as features themselves, as the roasting conditions [19] or the type provided by flow-meter pulse counter, then deriving the instant of coffee and roast combined [3]. flow rate (i.e., the ratio of the amount of water and the time). However, GC and MS analysis often require a significant Each extraction has been performed with specific values of amount of time and human intervention. Many studies exploit the quality-influencing variables, hence allowing us to know the Electronic Nose (EN) systems to overcome the complexity and ground-truth labels of high-quality espresso coffees, i.e., those cost of GS/MS techniques. An electronic nose is a device intended having all optimal settings for (i) coffee ground size, (ii) ground to mimic human olfaction. It consists of an array of chemical amount, and (iii) water pressure. An exhaustive set of coffees has sensors for chemical detection and a pattern recognition system been produced to observe the effect of non-optimal values on the capable of identifying the specific components of an odor [11]. EN espresso quality. For each quality-influencing variable, different are frequently exploited for determining and discriminating cof- values are considered: ground size can be coarse, optimal, or fine; fee characteristics. Several works aim at determining the roasting ground amount can be high, optimal, or low; brewing water pres- degree [17], using PCA and Neural Networks (NN) coupled with sure can be high, optimal, or low. All possible combinations of GRNN, while others focus on distinguishing coffee blends, explot- the three external-variable values (e.g., optimal, high, low) have ing both NN [15] and Support Vector Machines techniques [16]. been included in the dataset, hence generating 33 = 27 possible EN systems are also used in conjunction with GS analysis, as input configurations. For each configuration among the 27 com- in [14], to characterize roasting degree and coffee beans from binations of external variables (for instance: coarse ground size, different countries. The analysis in [20] studies espresso chemical optimal ground amount, and high water pressure), 20 espresso ex- attributes when the extraction time and grinding level are varied. tractions have been performed. Experiments have been repeated The work emphasizes the importance of the first 8 seconds of the on a professional coffee-making machine, generating a datasets espresso brew, because in this range the major amount of organic consisting of 540 espresso extractions. acids, solids and caffeine are extracted. This result confirms the The domain-expert quality thresholds used in our experiments relevance of analyzing the entire trend of coffee extractions to are as follows: espresso volume from 20–30 ml, extraction time characterize their quality. from 20–30 s. The values have been selected according to public Finally, data-driven approaches can be applied for large-scale literature, e.g., those published by the Specialty Coffee Associa- and real-time espresso quality assessment, exploiting Internet of tion of Europe [5, 12]. The flow rate thresholds derive from the Things (IoT) sensors in place of the more sensitive and unstable above-mentioned ones, as the flow rate is the ratio of the volume EN devices. Recently, a data-driven approach that exploits asso- by the time, hence obtaining the range 0.67–1.50 ml/s. ciation rule mining has been proposed to analyze the correlation Given such thresholds, espresso extractions can be labelled of coffee-making machine parameters and espresso quality [5]. with their quality assessment. Quality labels are optimal, too The work relies solely on metrics describing each espresso brew- low or too high for each of the quality variables: volume, time, ing as a whole (e.g., average flow, total amount of water). In the and flow. Table 1 recaps the domain-based threshold values and proposed work, instead, we focus on the brewing time series to corresponding labels. fully characterize the coffee extractions. Time series analysis is a popular and well-known approach in Table 1: Domain-based quality thresholds. many application fields [10, 13], from physiological data [4] to energy and weather data [9]. However, in our work, we exploit Quality Variable Low Optimal High a basic intuition on the time series trend and resort to feature engineering to avoid a direct analysis of the time series itself. extraction time (s) <20 [20–30] >30 Feature engineering from time series has been extensively ad- volume (ml) <20 [20–30] >30 dressed for different applications, as in [7] for industrial one in flow rate (ml/s) <0.67 [0.67–1.50] >1.50 the context of IoT and Industry 4.0, or for pattern matching of technical patterns in financial applications [8]. With respect to the state of the art, the current work con- The problem tackled by this work stems from the fact that tributes by cleverly transferring known and simple time-series analyzing the standard quality-evaluation variables without the additional time-series novel features, many false positives are provided: some espresso extractions are characterized by high- 175 quality values in terms of water amount, flow rate and extraction time, however, their ground size, ground amount or water pres- sure were not optimal (compensation effect [5]). 150 4 TIME-SERIES FEATURE ENGINEERING 125 Quantity of water (ml) Feature engineering refers to the process of extracting features from raw data. It is typically executed to improve the performance 100 of predictive or classification models. In the current work, we exploit feature engineering to leverage the coffee-brewing time series with the aim of improving the espresso quality assessment. 75 For each coffee extraction, the time series of the flow-meter pulses is stored, with sampling time equal to 300 ms. Flow-meter 50 pulses are firstly converted to quantity of brewed water q, as follows: 25 nump ∗ pulseq q= (1) numc 0 where nump is the number of pulses of the flow-meter, pulseq 0 5 10 15 20 25 represents the quantity of brewed water per pulse of the flow- Time(s) meter and numc represents the number of brewed coffees. In the experimental data under analysis, pulseq =0.5 ml, as given by the Figure 1: A real sample time series of the total water quan- coffee-machine datasheet, and numc =2, since two espresso cof- tity of an espresso coffee brewing. fees are brewed for each extraction. The time series captures the water quantity over time, hence the instant flow rate is known. Figure 1 shows an example of a real time series from the dataset. Algorithm 1: Trend point computation We notice a clear two-segment trend that is observable for any Result: Trend point arbitrary extraction: a first steeper phase is followed by a second 1 max td = 0.0; part having a lower flow rate. This phenomenon is known by 2 pointmax t d = (0.0, 0.0); domain experts. In the first, transient, phase of coffee brewing, 3 for i = 0 to N − 2W do water is forced in the coffee panel inside the filter holder, and 4 w 1 = ranдe(i, i + W ); coffee grounds do not slow the water flow yet. On the contrary, in the second phase, water penetrate and dampen coffee grounds 5 w 2 = ranдe(i + W , i + 2W ); yielding the actual coffee extraction. 6 w 1me an = mean(compute_slopes(w1)); We propose to extract the following new features to capture 7 w 2me an = mean(compute_slopes(w2)); the two-fold behavior of the extraction. We firstly determine the 8 trend_di f f = w 2me an − w 1me an ; point where a significant flow variation is observed. We refer 9 max td , pointmax t d = updateMax(trend_di f f ); to this point as trend point. The trend point is used to approxi- 10 end mate the water quantity time series as a polygonal chain. The 11 trend_point = pointmax t d ; approximate polygonal chain is constituted by two line segments 12 return trend_point that represent the two phases of the water flow and its vertex of intersection is the trend point. The trend point is estimated by considering the maximum variation of the slope average of the points in two consecutive not-overlapping sliding windows where p j = (t j , q j ) and p j−1 = (t j−1, q j−1 ) are consecutive points of size W . The slope si (or gradient) of two consecutive points of the time window. pi = (ti , qi ) and p j = (t j , q j ) is computed as follows. The slope average is estimated for the two sliding windows, as reported in Lines 6 and 7. The two terms capture the average flow q j − qi si = (2) rate in the corresponding time window. The difference of the t j − ti two slope averages is computed in Line 8. The maximum slope In Equation 2, t is the time reference and q is the water quantity, variation and the corresponding point are updated in Line 9. and they represent the axes of Figure 1. The slope s describes the The point of maximum variation corresponds to the intersec- steepness of the water flow. tion point of the two considered sliding windows. The process The procedure for the trend point estimation is reported in is repeated until all N points of the time series are considered. Algorithm 1. Finally, the trend point is returned (Line 12). The maximum variation of the slope and the corresponding The trend point ptp = (ttp , qtp ) represents the intersect vertex points are initialized in Lines 1 and 2. In Lines 4 and 5, two con- of an approximate polygonal chain of the water quantity time secutive not-overlapping sliding windows of size W are defined. series. It is exploited to compute two features that capture the two Let w k be a time window of size W . The slope average w kme an phases of the espresso extraction. Let be p0 = (t 0, q 0 ) and p N = of all consecutive points of the time window is computed as (t N , wq N ) the first and last points of the time series, respectively. follows We define s 1 and s 2 as follows. W −1 1 Õ q j − q j−1 qtp − q 0 w kme an = (3) s1 = (4) W − 1 j=1 t j − t j−1 ttp − t 0 175 5.2 Data characterization Real Flow Average Flow We firstly analyze the relationship between the extracted features 150 Slope 1 and the quality-evaluation variables (i.e., total extraction time, Slope 2 average flow rate, total water amount). The trend point and the 125 Trend Point consequent slope values have been computed with a window size Quantity of water (ml) W set to 10. The correlation analysis shows that slope s 2 is highly cor- 100 related with the average flow rate (over the whole extraction), with a Pearson correlation coefficient equal to 0.95, and the total 75 brewing time, with a correlation coefficient of -0.94. As expected, lower flow rates lead to longer extraction times, since the total amount of coffee is an almost constant goal of the coffee machine. 50 We then investigate the relationship between the two aver- age flows (i.e. s 1 and s 2 ) and the three external quality-influencing 25 variables: water pressure, coffee ground amount and coffee ground size, also known as grinding setting). Figure 3 shows the pressure behavior with respect to s 1 and s 2 . 0 The pressure values (low, optimal, and high) are represented by 0 5 10 15 20 25 Time (s) the label in the scatter plot. We can observe that coffee extractions in the (s 1 , s 2 ) space are clearly divided in three macro-areas, determined by s 1 value. The central partition is characterized by Figure 2: Features engineered from the espresso extrac- an optimal pressure, while the first and last areas by low and high tion time series with Trend Point, Slope 1, and Slope 2. values of pressure respectively. Hence, the value of the external variable highly influence the first phase of coffee extractions, when water is forced into the coffee panel. To a low pressure q N − qtp corresponds a low water flow in the initial phase and vice versa s2 = (5) for the high pressure. The flow in the second phase is instead t N − ttp almost independent from the pressure value. In Figure 2, the approximate polygonal chain of a coffee ex- Regarding the total amount of water, we report in Figure 4 traction time series is reported. The dashed line indicates the the coffee extractions as a function of s 1 and s 2 . Differently from average water flow. The slope s 1 represents the average flow of the pressure-labeled scatter plot, it is not observable a sharp dis- the first phase of the espresso brewing while slope s 2 the average tinction. We can however identify a relationship with s 2 . Higher flow of the second phase. These two features are exploited in the amounts of coffee ground lead to lower values of the flow s 2 . In analysis to better characterize the coffee extraction, providing this case, the average flow in the second phase of the extraction additional information with respect to the overall average flow. is hindered by the higher amount of coffee ground. Hence, the The extracted features will also be exploited to compute new water flow is reduced. Likewise, the lower quantity of coffee ranges for the optimal quality parameters, hence improving the ground facilitates the flow of water, with a consequent increase recognition of high-quality coffees. in flow s 2 . The coffee ground amount, instead, do not influence s 1 , since it captures the average flow of the water when it is forced in the coffee panel and before the coffee ground tampering. 5 EXPERIMENTAL RESULTS Finally, we observe a similar behavior when considering the This section provides a description of the data cleaning proce- coffee ground size (i.e., grinding settings), hence we do not report dures applied to the dataset (Section 5.1), a discussion of the data the plot. A coarser grinding generally corresponds to a higher characterization of the extracted features (Section 5.2), and their flow. The finer coffee grinding instead hinders the water flow. contribution to the espresso quality assessment improvement This results in a lower flow s 2 in the second phase of the coffee (Section 5.3). extraction. 5.1 Data cleaning 5.3 Quality Evaluation The dataset has been pre-processed by applying the data clean- In this section, we evaluate the extracted feature ability to char- ing steps described in [5]. The original dataset consists of 1080 acterize espresso quality and to improve the detection of high- coffees, corresponding to 540 extractions. Among them, 30 extrac- quality espresso coffees. All the three external variables are under tions were missing the time series data due to low-level hardware the barista control. However, brew pressure is set at first in the issues. Domain-driven thresholds, aimed at removing values be- espresso machine calibration phase and it is periodically checked ing unacceptable for the phenomena under exam, lead to other 38 and configured, typically with the support of technicians. On extractions to be discarded. As described in [5], domain-driven the other hand, the grinding settings and the amount of coffee threshold values of valid espresso extractions have been set to ground are determined by the barista at each espresso brewing. 10–40 ml and 10–40 s, according to leading industrial domain Hence, it is particularly relevant to control that these two exter- experts. Finally, the statistical-based outlier removal approach nal variables are set properly by the barista. In existing works, of [5] removed 15 additional samples from the dataset. After the domain-experts and data-driven thresholds on quality indexes, cleaning procedure, 457 extraction time series remain out of the such as espresso volume, extraction time and brewing flow rate, 540 original records. have been applied to evaluate coffee quality. The analysis in [5] 5.5 experience, hence possibly affecting also the brand image of the Low Pressure Optimal Pressure coffee supplier. To this aim, we exploit the time-series features 5.0 High Pressure to better characterize the quality of espressos so that false high- quality coffees can be detected and, if not totally avoided, at least 4.5 significantly reduced. As a reference, we consider domain-driven thresholds on cof- 4.0 fee quality indexes. In Figure 5 the espresso extractions with optimal values of quality indexes are reported in the s 1 and s 2 Slope 2 3.5 space. They can be grouped as follows. (i) True high-quality ex- tractions present optimal values for both the quality-evaluation indexes and, in particular, for all external variables. (ii) False 3.0 high-quality extractions present optimal quality-index values with respect to domain-expert thresholds, but at least an external 2.5 variable has a sub-optimal value [5]. Such espresso extractions (ii) are the result of compensation effects. 2.0 We refer to true high-quality extractions as optimal, and we characterize them as a function of the proposed time-series fea- 1.5 tures s 1 and s 2 . Let O be the set of optimal extractions {o 1, o 2, ..., o N }, 4.0 4.5 5.0 5.5 6.0 6.5 where each point oi ∈ O is defined in terms of s 1 and s 2 , i.e., Slope 1 oi = (oi_s1 , oi_s2 ). We define novel quality thresholds for optimal extractions To_min and To_max in the (s 1 , s 2 ) space as follows: Figure 3: Extractions in the proposed feature space, la- To_min = (min(oi_s1 ), min(oi_s2 )) (6) beled according to the water pressure value. To_max = (max(oi_s1 ), max(oi_s2 )) (7) 5.5 Among the whole set of espresso extractions E = {e 1, e 2, ..., e M }, Low amount erogations a generic sample e j = (e j_s1 , e j_s2 ) ∈ E is labeled as optimal Optimal amount erogations 5.0 High amount erogations e ∈ O, with O ⊆ E, if its values of flow rate (e j_s1 , e j_s2 ) are within the thresholds To_min and To_max . In Figure 5 two rectangular areas are shown. The green area 4.5 contains the optimal extractions. Its boundaries are defined by the thresholds To_min and To_max . The orange dashed area contains 4.0 the false high-quality extractions, which current state-of-the- art solutions would (incorrectly) classify as high-quality coffees. Slope 2 3.5 Exploiting the proposed thresholds in the new feature space, we can detect many false positives (orange squared points in the 3.0 plot). Specifically, instead of assigning an optimal label to the overall 67 extractions (green and orange ones), we can correctly 2.5 detect the 20 true optimal extractions (green ones), and we can discard 31 out of 47 false positives (orange ones). State of the 2.0 art thresholds would lead to the same true positive detection (20 out of 67), while the proposed approach leads to a drastically 1.5 better accuracy (76% instead of 30%) and precision of high-quality extractions (56% instead of 30%). 4.0 4.5 5.0 5.5 6.0 6.5 Slope 1 To drill down the analysis, we further distinguished two types of false positives, stemming from different compensation effects: Figure 4: Extractions in the proposed feature space, la- (i) low amount of coffee ground with fine grinding and (ii) high beled according to the coffee ground amount. amount of coffee ground with coarse grinding. The former is less common, since very few baristas intentionally use higher amounts of coffee ground, being a cost for them. On the contrary, explored the phenomena of compensating sub-optimal values the latter is much more frequent, because it brings savings on of different external variables. A compensation effect is observ- coffee ground costs. For this reason, extractions affected by the able when configurations of values of external variables allow latter are of greater interest. to achieve apparently high-quality coffees, in terms of quality In Figure 6 three areas are shown. The green one still contains indexes, despite one or more values are, in fact, not optimal. Inter- the true optimal extractions, the blue one contains the extractions pretable exploration techniques highlighted that high amounts belonging to the first type of compensation and the orange one of coffee ground, that generally hinder the water flow and lead now contains only the extractions belonging to the second type to long percolation times, could be compensated by a coarser of compensation. Again, exploiting thresholds in the new feature grinding that, on the other hand, facilitates the flow [5]. Simi- space, the target extractions can be correctly classified and the larly, the low amounts of coffee ground could be compensated compensation effect can be detected. Results show that all 23 by a finer grinding. Despite the optimal quality-index values, extractions from type-(ii) compensation can be correctly detected, the low amount of coffee has generally a negative impact on besides 8 extractions out of 24 from type-(i) compensation, which coffee intensity and body, and therefore on the final customer means improving from 30% accuracy of data-driven state of the 5.5 state-of-the-art data-driven approaches: results yielded to three- False high-quality extractions True optimal extractions fold improvements in accuracy, from 30% to 100%, with specific 5.0 focus on currently misclassified extractions due to common com- pensation effects. The proposed methodology can be applied 4.5 in similar contexts to improve current data-driven analyses of espresso quality. 4.0 Future works aim to widen the scope of the analysis includ- ing additional quality variables, definitely different models of professional coffee-making machines, diverse coffee blends, and Slope 2 3.5 environmental variables. Furthermore, we plan to apply cluster- ing techniques for determining the quality-index thresholds. 3.0 ACKNOWLEDGMENTS 2.5 This work is partially funded by the SmartData@PoliTO center. 2.0 REFERENCES [1] S. Andueza, L. Maeztu, B. Dean, M. P. de Peña, J. Bello, and C. Cid. 2002. 1.5 Influence of Water Pressure on the Final Quality of Arabica Espresso Coffee. Application of Multivariate Analysis. J. Agric. Food Chem 50, 25 (2002), 7426– 4.0 4.5 5.0 5.5 6.0 6.5 7431. https://doi.org/10.1021/jf0206623 PMID: 12452670. Slope 1 [2] S. Andueza, L. Maeztu, L. Pascual, C. Ibáñez, M Paz de Peña, and C. Cid. 2003. Influence of extraction temperature on the final quality of espresso coffee. J. Sci. Food Agric. 83, 3 (2003), 240–248. https://doi.org/10.1002/jsfa.1304 Figure 5: True optimal extractions and false high-quality [3] S. Andueza, M. A. Vila, M. Paz de Peña, and C. Cid. 2007. Influence of cof- extractions in the proposed feature space. fee/water ratio on the final quality of espresso coffee. J. Sci. Food Agric. 87, 4 (2007), 586–592. https://doi.org/10.1002/jsfa.2720 [4] D. Apiletti, E. Baralis, G. Bruno, and T. Cerquitelli. 2009. Real-time analysis of 5.5 physiological data to support medical applications. IEEE Trans. Inf. Technol. (High amount - Coarse grinding) extractions Biomed. 13, 3 (2009), 313–321. https://doi.org/10.1109/TITB.2008.2010702 (Low amount - Fine grinding) extractions [5] D. Apiletti and E. Pastor. 2020. Correlating Espresso Quality with Coffee- 5.0 True optimal extractions Machine Parameters by Means of Association Rule Mining. Electronics 9, 1 (2020), 100. [6] G. Caprioli, M. Cortese, G. Cristalli, F. Maggi, L. Odello, M. Ricciutelli, G. 4.5 Sagratini, V. Sirocchi, G. Tomassoni, and S. Vittori. 2012. Optimization of espresso machine parameters through the analysis of coffee odorants by HS- SPME–GC/MS. Food Chemistry 135, 3 (2012), 1127 – 1133. 4.0 [7] M. Christ, A. W Kempa-Liehr, and M. Feindt. 2016. Distributed and parallel time series feature extraction for industrial big data applications. arXiv preprint arXiv:1610.07717 (2016). Slope 2 3.5 [8] T. Chung, F.and Fu, R. Luk, and V. Ng. 2001. Flexible time series pattern matching based on perceptually important points. (2001). [9] E. Di Corso, T. Cerquitelli, and D. Apiletti. 2018. METATECH: METeorological 3.0 data analysis for thermal energy characterization by means of self-learning transparent models. Energies 11, 6 (2018). https://doi.org/10.3390/en11061336 [10] P. Esling and C. Agon. 2012. Time-series data mining. ACM Computing Surveys 2.5 (CSUR) 45, 1 (2012), 1–34. [11] J. W. Gardner and 1956 Bartlett, Philip N. 1999. Electronic noses : principles and applications. Oxford ; New York : Oxford University Press. 2.0 [12] Istituto Nazionale Espresso Italiano. [n.d.]. Espresso Italiano Certificato. http: //www.espressoitaliano.org/files/File/istituzionale_inei_hq_en.pdf/. [Online; accessed January-2020]. 1.5 [13] E Keogh, S. Chu, D. Hart, and M. Pazzani. 2004. Segmenting time series: A survey and novel approach. In Data mining in time series databases. World 4.0 4.5 5.0 5.5 6.0 6.5 Slope 1 Scientific, 1–21. [14] T. Michishita, M. Akiyama, Y. Hirano, M. Ikeda, Y. Sagara, and T. Araki. 2010. Gas chromatography/olfactometry and electronic nose analyses of retronasal aroma of espresso and correlation with sensory evaluation by an artificial Figure 6: Optimal extractions in the proposed feature neural network. J. Food Sci. 75, 9 (2010), S477–S489. space and false high-quality extractions due to different [15] M. Pardo, G. Niederjaufner, G. Benussi, E. Comini, G. Faglia, G. Sberveglieri, compensation effects. M. Holmberg, and I. Lundstrom. 2000. Data preprocessing enhances the classification of different brands of Espresso coffee with an electronic nose. Sensors and Actuators B: Chemical 69, 3 (2000), 397–403. [16] M. Pardo and G. Sberveglieri. 2005. Classification of electronic nose data with support vector machines. Sensors and Actuators B: Chemical 107, 2 (2005), 730 art to 100% accuracy considering only true optimal and type- – 737. https://doi.org/10.1016/j.snb.2004.12.005 (ii) compensation extractions. To this aim, in our dataset, the [17] S. Romani, C. Cevoli, A. Fabbri, L. Alessandrini, and M. Dalla Rosa. 2012. new feature thresholds have been set as 5.19 < s 1 < 5.48 and Evaluation of coffee roasting degree by using electronic nose and artificial neural network for off-line quality control. J. Food Sci. 77, 9 (2012), C960–C965. 2.64 < s 2 < 3.73. [18] Rossi writes. [n.d.]. Coffee in Italy or 101 Facts about Ital- ian Coffee Culture. http://rossiwrites.com/italy/italy-for-foodies/ coffee-in-italy-italian-coffee-culture. [Online; accessed January-2020]. 6 CONCLUSIONS [19] S. Schenker, C. Heinemann, M. Huber, R. Pompizzi, R. Perren, and R Escher. This work presented a data-driven analysis of a real-world time- 2002. Impact of Roasting Conditions on the Formation of Aroma Compounds in Coffee Beans. J. Food Sci. 67, 1 (2002), 60–66. series dataset of espresso brewing by professional coffee-making [20] C. Severini, I. Ricci, M. Marone, A. Derossi, and T. De Pilli. 2015. Changes in the machines. The proposed feature space, despite being simple and Aromatic Profile of Espresso Coffee as a Function of the Grinding Grade and easy to compute, brought a large improvement in the classifica- Extraction Time: A Study by the Electronic Nose System. J. Agric. Food Chem 63, 8 (2015), 2321–2327. https://doi.org/10.1021/jf505691u PMID: 25665600. tion accuracy of high-quality espresso with respect to current