1. Introduction

Explainable Virtual Metrology in Semiconductor Industry⋆

Amina Mević

0 0 University of Sarajevo - Faculty of Electrical Engineering

My PhD research focuses on developing a highly accurate and explainable multi-output virtual metrology system for semiconductor manufacturing. Using machine learning, we predict the physical properties of metal layers from process parameters captured by production equipment sensors. Key contributions include a model-agnostic explanatory method based on projective operators, providing insights into the most influential features for multi-output predictions.

eol>explainable artificial intelligence multi-input multi-output prediction semiconductor manufacturing

1. Introduction

The rising demand for digitalization and decarbonization has led to the increasing production of chips (9.2 billion chips by Infineon Technologies AG in 2023 [ 1 ]), highlighting the need for faster, more reliable, and eficient semiconductor manufacturing with minimal waste. It is not always possible to fully inspect products during manufacturing due to the destructive nature of some testing techniques [ 2 ]. The industry relies on random sampling and inspection, which does not ensure comprehensive quality control of products or optimal yields. With the growing use of artificial intelligence (AI), there is potential to predict product properties using data from existing monitoring systems. Key industry research questions include: Which process control signals are necessary for accurate prediction of product properties? Which machine learning pipeline performs best, and what metrics should be used to evaluate its performance?

Virtual metrology (VM), introduced in 2005 in the semiconductor manufacturing industry [ 3 ], involves estimating a product’s quality directly from production process data, using supervised or unsupervised machine learning (ML) algorithms [ 4 ], without physically measuring it [ 2 ], thereby reducing production times and costs.

Following previous eforts in this direction [ 2 ], we focus on creating a VM system to predict the properties of a thin film produced in the physical vapor deposition (PVD) process. PVD is one of the main steps in the production process, used to create thin metal layers by depositing metal vapor onto a substrate [ 5 ]. The important physical properties of the film, such as thickness and resistance, depend on process parameters like deposition time, power, voltage, electrical current, temperature, and pressure. After production, these properties are measured at 17 points. We aim to predict both properties at all 17 points simultaneously, requiring the use of methods suitable for predicting multiple variables.

We observe that the mean values of measured thickness and resistance exhibit an oscillatory pattern over time, influenced by factors such as equipment maintenance. To account for these variations, we aim to develop a predictive model using historical data to forecast future mean and standard deviation values of both resistance and thickness simultaneously. This approach will enable a more accurate assessment of process stability and facilitate proactive adjustments in semiconductor manufacturing.

Existing explainability techniques are largely designed for single-output scenarios, making their direct application to multi-output predictions challenging. To address this, I introduce a model-agnostic explanatory approach based on projective operators, which efectively captures the dependencies between input features and multiple correlated outputs. This method ensures interpretability by identifying the most influential process parameters and their contribution to the predicted film properties, thereby improving trust and transparency in VM systems for semiconductor manufacturing.

2. Related Work

Virtual Metrology (VM) plays a crucial role in semiconductor manufacturing by estimating product quality from process data without direct physical measurements [ 6 ]. The increasing complexity of semiconductor processes has led to the adoption of machine learning (ML) techniques, both supervised and unsupervised, to improve VM accuracy [ 4, 7 ]. With the expansion of deep learning, research has focused on enhancing VM systems using neural networks [ 8, 9 ]. However, traditional VM methods often rely on single-output predictions, despite the fact that many semiconductor processes involve multiple interrelated quality variables that should be modeled simultaneously.

Multi-output learning has gained attention in recent years as a more efective approach for predicting multiple dependent target variables. Multi-input multi-output (MIMO) prediction is a machine learning paradigm that focuses on predicting multiple outputs simultaneously for a given instance [ 10 ]. This paradigm encompasses various techniques such as multi-label learning, multi-dimensional learning, and multi-target regression, all of which aim to leverage interdependencies among outputs to improve prediction performance [ 11 ]. While multi-output approaches are well explored in fields such as healthcare [ 12 ], environmental science [ 13 ], and air quality forecasting [ 14 ], they remain underutilized in semiconductor VM modeling [ 15 ].

A few studies have incorporated multi-output learning into VM. Choi et al.[ 15 ] proposed a convolutional neural network (CNN)-based multivariate VM model using multi-sensor data to model an etching process, demonstrating the advantages of joint prediction. Similarly, Yamaguchi et al.[ 16 ] introduced a multi-target regression method combining Random Linear Target Combinations with Principal Component Analysis (PCA). These approaches highlight the potential of multi-output learning in semiconductor applications but remain limited in scope and application.

In VM systems, sensors record large volumes of redundant and non-informative signals, necessitating the use of feature selection and dimensionality reduction techniques [17, 18, 19]. Feature selection is crucial for improving model eficiency and interpretability, especially in high-dimensional settings. According to a recent literature review [ 2 ], PCA remains the most commonly utilized convex dimensionality reduction method in semiconductor VM. More recent methods, such as the ProjSe algorithm [20], utilize projection operators to perform variable selection in multi-output learning tasks, ofering scalability and the ability to capture nonlinear dependencies.

Despite advances in multi-output learning and feature selection, there remains a critical gap in the explainability of multi-output models. Interpretability methods such as SHAP [21] and LIME [22] have been widely applied in single-output scenarios but are not inherently designed for multi-output predictions. To the best of our knowledge, no dedicated explainability methods for multi-output models currently exist in the literature, underscoring the need for further research in this area.

3. Methodology 3.1. Data Collection and Preprocessing

The first step in my PhD research involved preprocessing data collected from a semiconductor manufacturing facility to prepare a dataset for a multi-output prediction task. The dataset was collected from 16 chambers of six Physical Vapor Deposition (PVD) machines at the Infineon Technologies AG fab between 2021 and 2023. It includes process parameter data captured by sensors during the production process, along with post-process measurement data.

After the PVD procedure, wafers with thin films undergo measurements at 17 points, where three key physical properties are recorded: resistivity, resistance, and thickness. The raw data contained duplicates, missing values, outliers, and uninformative features, requiring a preprocessing phase to ensure data quality and model reliability. The following preprocessing steps were applied: • Removal of constant and uninformative features to reduce redundancy. • Removal of duplicated features to avoid introducing unnecessary correlations. • Outlier removal based on domain knowledge, ensuring that only physically meaningful values are retained.

• Replacement of missing values with the median to preserve the data distribution.

After preprocessing, the final dataset consists of 3,598 products and 122 columns, where 104 columns represent process parameters (features), and 51 columns correspond to wafer physical properties measured post-PVD.

3.2. Feature Selection

We use feature selection and importance algorithm ProjSe[20] to select and rank process parameters that are the most important for the prediction of resistivity, resistance, and thickness of the wafer. ProjSe is the state-of-the-art approach for variable selection for multi-output learning problems based on projection operators and their algebra. The method uses a kernel-based representation to capture complex relationships between variables. The algorithm chooses iteratively the input variable that has the highest correlation with the outputs while being as uncorrelated as possible with the inputs already selected. This ensures each new variable adds relevant information to the prediction model without redundancy.

3.3. Prediction Models

The next step after selecting features is to choose the best prediction model. Following the previous work [ 3 ] and the thorough data analysis, we use four regression models for multiple output prediction: Linear Regressor (LR) [23], K Neighbors Regressor (KNN) [24], Random Forest Regressor (RFR) and Decision Tree Regressor (DTR) [25]. In the interest of maintaining a fair comparison, we assessed the models in their default configurations.

3.4. Explanatory Method

The proposed framework for interpreting MIMO predictions, ProjEx, is based on projective operator algebra and provide local and global explanations of behavior ML models. Local explanations focus on understanding individual predictions by analyzing the specific features which influence a single prediction output, providing insights into why a model made a particular decision for a given input [26]. Global explanations aim to provide an overarching understanding of the model’s behavior across all predictions, analyzing the model’s overall structure and feature importance to identify patterns that apply to the entire dataset [27].

The ProjEx explanation for a MIMO prediction output y^i for input vector xi, is defined as a tuple {xi, yp, Elocal, Eglobal} where Elocal provides impacts of current input to the output. Eglobal reveal the model’s overall patterns and rules it follows to make predictions across the entire dataset.

ProjSe [20] is applied to select the features most correlated with the predicted target variables. For each selected feature xs ∈ Xs , we find the projection of the vector xs onto the plane spanned by the predicted variables yp ∈ Yp. The projection xsproj of a xs onto the plane spanned by yp is given by: xsproj = Yp(Yp⊤Yp)− 1Yp⊤xs. We calculate the correlation coeficient between xs and xsproj using: = ‖xxss‖· ‖xxsspprorjoj‖ . These correlation coeficients provide a global explanation of the model by indicating the influence of features on the predicted targets through the projection coeficients. To assess the impact of each feature value of the observed input vector xi on the predicted variables, we calculate xE vectors by scaling the selected feature with the correlation coeficient: xE = xs · .

4. Results

Minimal subset of relevant features. Using a linear kernel to select features with ProjSe relevant to predicting 18 output variables, we identified the 18 most important features for prediction. Three of these 18 features coincided with the ones chosen by experts, and four were statistical metrics of the selected features. We compared the prediction results of the least-squares regression model [28] using features selected by ProjSe with those selected randomly. Prediction accuracy is measured using Pearson corre∑︀ =1(− ¯)( − ¯) lation between actual and predicted outputs : , = Cov(,) = √∑︀=1(− ¯)2√∑︀=1( − ¯)2 and iteratively recalculated as features are added.

Fig. 1 demonstrates how the number of selected features enhances prediction accuracy compared to random selection. The projective operator-based selection yields higher Pearson correlation coeficients, indicating a correlation between selected features and outputs. After 14 features, the cumulative correlation does not increase. The 14 top-ranked features have a high correlation with the output variables and minimal correlation with each other.

Evaluation of prediction models. In the second experiment, we evaluate the performance of prediction models concerning selected features. The performances of prediction algorithms LR, KNN, FRF, and DTR are evaluated on three datasets: DS1 - dataset consisted of all process parameters (104 feature values), DS2 - dataset of features selected by experts (10 feature values) and DS3 - a dataset of features selected by ProjSe (18 feature values). We employed 10-fold cross-validation to evaluate model performance across various data subsets comprehensively and utilized a suite of performance metrics, including MSE, MAE, MAPE, RMSE, and R-squared score.

Results are given in Table 1. There is no diference between the datasets, which implies that ProjSe selected enough informative features. For all datasets, the best performance has DTR, but RFR demonstrates comparable results. Our results confirm the findings in earlier work [ 29]. It shows that there is no significant diference among datasets, implying that the subset of features selected by ProjSe contains enough informative features and is almost equally informative as the subset selected by experts. Evaluation of explanatory method. The proposed explanatory method has been evaluated on real-world dataset from semiconductor industry and two additional publicly available datasets. Results demonstrate high efectiveness in terms of explanation stability, complexity, and efective complexity. Proposed method outperforms KernelSHAP, LIME, SHAP and TreeInterpreter in computation time, while the introduced stability index and correlation are comparable. To evaluate the efectiveness and user satisfaction of ProjEx, we conducted a user study. The study was designed following recommendations for metrics on explanation satisfaction [30].

Stability and computation time w.r.t. prediction model. ProjEx was applied to explain predictions

from CNN and tree-based models across three datasets. Table 2 summarizes the stability indices, correlation and computation time calculated within 5 folds, across the models for each dataset.

The stability index of ProjSe remains generally high with only slight variation across models, while the correlation exhibits significant diferences. ProjSe achieves the highest stability indices with XGBoost for scm1d and pvd, and demonstrates the best performance with CNN for osales. In terms of correlation, ProjSe excels with CNN on scm1d and osales, whereas XGBoost outperforms other models on pvd. All models exhibit comparable and minimal computation times, highlighting the computational eficiency of ProjSe.

Comparison of stability and eficiency of ProjEx against xAI methods. A RFR was used as the predictive model for comparisons ProjEx with KernelSHAP, LIME, SHAP, and TreeInterpreter. Since these methods are not natively designed for explaining multi-output models, they were adapted to select the most influential features for each of the 12 output variables in the osales dataset. Stability was evaluated by measuring consistency across the 5 folds for each output variable independently. While stability indices were generally high for all methods, assessing stability across all output variables in each fold revealed lower scores due to the methods’ limited ability to account for inter-variable interactions. As shown in Table 3, ProjEx achieved competitive stability indices (0.81) but lower correlations (0.26), reflecting its distinct approach to feature importance computation. It significantly outperformed all methods in computation time, requiring only 1.87 seconds compared to LIME’s 1,267.76 seconds.

Evaluation of explainability methods using Quantus. The CNN model was explained on all three

datasets, with ProjEx, KernelSHAP, LIME, and SHAP and explanations are evaluated by Quantus metrics: complexity and efective complexity. Quantus requires three inputs for evaluation: the test dataset array, the target values predicted by the model being explained, and an array of ranked features or feature importance scores provided by the explainability method for each sample. We adapted KernelSHAP, LIME,and SHAP for multi-output regression settings using the following strategies: Ten random samples from the test dataset were used as the evaluation set for Kernel SHAP. Separate explainers were instantiated for each target variable using a wrapper function to generate predictions specific to individual targets. For LimeTabularExplainer, a subset of 100 random test samples was selected to compute normalized feature attributions separately for each target variable. A subset of 100 random test samples was used as the evaluation dataset, while the entire test dataset served as background data to compute feature attributions for each target variable using SHAP’s DeepExplainer. We calculated complexity and efective complexity for each output variable individually and then averaged values across all target variables. For ProjEx, evaluation metrics are computed directly, as it provides explanations for all target variables simultaneously.

The results are summarized in Table 4. ProjEx exhibits significantly lower complexity and efective complexity across all datasets then KernelSHAP, LIME, and SHAP, showcasing its superior computational eficiency and interpretability. This eficiency is particularly relevant for large-scale multi-output prediction tasks, where traditional explainability methods often struggle with scalability and computational demands.

User study. To evaluate the efectiveness and user satisfaction of the ProjEx we conducted a user study. The study was designed based on recommendations for metrics on Explanation Satisfaction [30].

There were 80 participants in total, 57 of whom reported having prior experience with AI. Participants were ofered explanations using the proposed ProjEx explanation method. The ages of the participants ranged from 18 to 36 years, with 47.5% identified as female and 51.3% as male. All participants were volunteers and had a minimum educational level of high school completion. To evaluate the explanations provided, participants were asked to rate the explanation using a 5-point Likert Explanation Satisfaction Scale [30], The overall mean satisfaction score was 3.25, with a mode of 3.0. For participants with prior AI experience, the mean satisfaction score was 3.27, also with a mode of 3.0.

5. Future Work

For future work, we plan to develop a scalable and efective feature selection method that performs well across various output dimensions. While ProjSe [20] works well for around 20 outputs, it struggles with fewer, and I aim to address this limitation. Measurement data from production exhibit oscillatory characteristics and time dependencies influenced by machine maintenance. To model these time series patterns, we plan to propose a method based on Fourier transformation. Additionally, we will explore projective operators and their algebra for explaining multivariate time series models. We aim to develop responsible AI guidelines for the semiconductor industry, aligned with the EU AI Act, ensuring that method design meets the requirements of high-stakes industrial applications, particularly in terms of data privacy, model transparency, and interpretability for domain experts.

Dataset Metric osales Complexity ↓

Efective Complexity ↓ pvd Complexity ↓

Efective Complexity ↓ scm1d Complexity ↓

Efective Complexity ↓

6. Acknowledgments

The work is supported by the IPCEI on ME/CT program of Infineon Technologies Austria AG.

Declaration on Generative AI

During the preparation of this work, the author used X-GPT-4 and Gramby in order to: Grammar and spelling check. After using these tools/services, the author reviewed and edited the content as needed and takes full responsibility for the publication’s content. [17] O. Djedidi, R. Clain, V. Borodin, A. Roussy, Feature selection for virtual metrology modeling: An application to chemical mechanical polishing, in: 2022 33rd Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC), 2022. [18] W. Jiang, C. Lv, B. Yang, F. Zhang, Y. Gao, T. Zhang, H. Wang, Statistical feature extraction and hybrid feature selection for material removal rate prediction in chemical mechanical planarization process, in: 2021 5th IEEE Electron Devices Technology & Manufacturing Conference (EDTM), 2021. [19] T. E. Korabi, V. Borodin, M. Juge, A. Roussy, A hybrid feature selection approach for virtual metrology: Application to cmp process, in: 2021 32nd Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC), 2021, pp. 1–5. doi:10.1109/ASMC51741.2021.9435673. [20] S. Szedmak, R. Huusari, T. H. Duong Le, J. Rousu, Scalable variable selection for two-view learning tasks with projection operators, Machine Learning (2023). [21] S. Kariyappa, L. Tsepenekas, F. Lécué, D. Magazzeni, Shap@ k: Eficient and probably approximately correct (pac) identification of top-k features, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2024. [22] M. T. Ribeiro, S. Singh, C. Guestrin, " why should i trust you?" explaining the predictions of any classifier, in: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016. [23] X. Su, X. Yan, C.-L. Tsai, Linear regression, Wiley Interdisciplinary Reviews: Computational

Statistics 4 (2012) 275–294. [24] Y.-P. Mack, Local properties of k-nn regression estimates, SIAM Journal on Algebraic Discrete

Methods 2 (1981) 311–323. [25] W.-Y. Loh, Classification and regression trees, Wiley interdisciplinary reviews: data mining and knowledge discovery 1 (2011) 14–23. [26] A. R. Mor, Y. Belinkov, B. Kimelfeld, Accelerating the global aggregation of local explanations, 2024. [27] V. Arya, D. Saha, S. Hans, A. Rajasekharan, T. Tang, Global explanations for multivariate time series models, 2023. doi:10.1145/3570991.3570998. [28] R. W. Farebrother, Linear least squares computations, 2018. [29] C.-H. Chen, W.-D. Zhao, T. Pang, Y.-Z. Lin, Virtual metrology of semiconductor pvd process based on combination of tree-based ensemble model, ISA transactions (2020). [30] R. R. Hofman, S. T. Mueller, G. Klein, J. Litman, Metrics for explainable ai: Challenges and prospects, arXiv preprint arXiv:1812.04608 (2018).

[1]

Infineon

Technologies AG , Infineon austria 2023 financial year , https://www.infineon.com/cms/ austria/en/press/GJ2324/Bilanz-Geschaeftsjahr_ 23 .html, 2023 . Accessed: 2024 -11-29.

[2]

Maitra ,

Su ,

Shi , Virtual metrology in semiconductor manufacturing: Current status and future prospects , Expert Systems with Applications ( 2024 ).

[3]

Chen ,

Wu ,

Lin ,

Ko ,

Lo ,

Wang ,

Yu ,

Liang , Virtual metrology: A solution for wafer to wafer advanced process control , in: ISSM 2005 , IEEE International Symposium on Semiconductor Manufacturing, 2005 ., 2005 .

[4]

Yan ,

Luo ,

Wang ,

Ding ,

Li ,

Ai ,

Sheng ,

Xia ,

Li ,

Chen , et al., Virtual metrology modeling for cvd film thickness with lasso-gaussian process regression , in: 2023 China Semiconductor Technology International Conference (CSTIC) , 2023 .

[5]

R. A.

Powell ,

S. M.

Rossnagel , PVD for microelectronics: sputter deposition applied to semiconductor manufacturing , 1999 .

[6]

P.-A.

Dreyfus ,

Psarommatis , G. May,

Kiritsis , Virtual metrology as an approach for product quality estimation in industry 4.0: a systematic review and integrative conceptual framework , International Journal of Production Research ( 2022 ).

[7]

Zhou ,

Diao ,

Jiang ,

Wen ,

Shi ,

Jing ,

Li , Virtual metrology of wat value with machine learning based method , in: 2022 China semiconductor technology international conference (CSTIC) , 2022 .

[8]

Dalla Zuanna ,

Gentner ,

G. A.

Susto , Deep learning-based sequence modeling for advanced process control in semiconductor manufacturing , IFAC-PapersOnLine 56 ( 2023 ) 8744 - 8751 .

[9]

Han , J . Min, J. Ma, G. Hwang,

Heo ,

Y. E.

Kim ,

Kang ,

Kim ,

Park ,

Sung , Deep learningbased virtual metrology in multivariate time series , in: 2023 IEEE International Conference on Prognostics and Health Management (ICPHM) , IEEE, 2023 , pp. 30 - 37 .

[10]

Xu ,

Shi ,

I. W.

Tsang ,

Y.-S.

Ong ,

Gong ,

Shen , Survey on multi-output learning , IEEE transactions on neural networks and learning systems ( 2019 ).

[11]

Borchani , G. Varando,

Bielza ,

Larranaga , A survey on multi-output regression , Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery ( 2015 ).

[12]

Cui ,

Xie ,

Shen ,

Lu ,

Wang , Prediction of the healthcare resource utilization using multi-output regression models , IISE Transactions on Healthcare Systems Engineering ( 2018 ).

[13]

Džeroski ,

Demšar ,

Grbović , Predicting chemical parameters of river water quality from bioindicator data , Applied Intelligence ( 2000 ).

[14]

Liang ,

Xia ,

Ke ,

Wang ,

Wen ,

Zhang ,

Zheng ,

Zimmermann , Airformer: Predicting nationwide air quality in china with transformers , in: Proceedings of the AAAI Conference on Artificial Intelligence , 2023 .

[15]

Choi ,

Zhu ,

Kang ,

M. K.

Jeong , Convolutional neural network based multi-input multi-output model for multi-sensor multivariate virtual metrology in semiconductor manufacturing , Annals of Operations Research ( 2024 ).

[16]

Yamaguchi ,

Yamashita , Multi-target regression via target combinations using principal component analysis , Computers & Chemical Engineering ( 2024 ).