Hosting
Sunday, January 19, 2025
Google search engine
HomeArtificial IntelligenceDevelopment of new computer models based on artificial intelligence techniques to predict...

Development of new computer models based on artificial intelligence techniques to predict the separation of liquid mixtures via vacuum membrane distillation


The methodology for predicting T from the input variables r and z involves several key steps: dataset preparation, outlier detection, model selection, hyperparameter optimization, model validation, and performance evaluation.14. Because the dataset is complete and there are no missing values, no preprocessing for missing data was necessary. However, to ensure data quality, outliers were identified and processed using the Elliptic Envelope method (here less than 0.5% of the data was detected as outliers). The data was also normalized using the Min-Max scaler. Four ML models were chosen: BRR, ERT, ENR and SVM. These models have been selected to provide a mix of linear and nonlinear approaches, allowing a comprehensive evaluation of their predictive capabilities. Hyper parameters for each model were optimized with DE. Model validation was performed using MCCV. MCCV involves randomly splitting the dataset into training and testing subsets multiple times, training the model on the training subset, and evaluating it on the testing subset. This process was repeated to ensure robust validation of the model performance, reporting the mean and standard deviation of the performance metrics. This method ensures that a single train-test split does not bias validation results and provides a more complete evaluation of the model’s generalizability18. Here the number of iterations is set to 100 and the train and test subsets constitute 80% and 20% of the entire data set. With the help of R2 score, MSE, MAE and MAPE, the performance of each model was assessed. Each metric was calculated for the models over multiple MCCV iterations to obtain the mean and standard deviation, which provided a comprehensive view of model performance. This systematic approach allowed for a thorough evaluation of the selected models, identifying the most accurate and reliable model for predicting temperature based on the input variables r and z. More detailed descriptions of the building blocks mentioned here are included in the following paragraphs. The step-by-step implementation of the proposed approach is shown in Figure 4. Machine learning algorithms and optimizers implemented using Python programming language.

Fig. 4

General methodology designed for this work.

Elliptical envelope method for outlier detection

The Elliptic Envelope method is a statistical technique for detecting outliers by assuming that the data follows a multivariate Gaussian distribution. This method constructs an elliptical boundary that encompasses the majority of the data points, effectively capturing the central tendency and dispersion of the data set. By adding an ellipse to the data, the method identifies points outside this boundary as outliers19. The process begins by estimating the mean and covariance of the data set. The Elliptic Envelope then uses these estimates to create an ellipse that encompasses the central data points, defined by a chosen confidence level. Mathematically, it aims to minimize the volume of the ellipse while covering a certain portion of the data, typically set to 95% or 99%. Data points outside this ellipse are marked as outliers.

Differential evolution

DE is recognized as a meta-heuristic evolutionary optimization approach aimed at iteratively improving the quality of potential solutions to optimize a given problem. This optimization can effectively handle real-valued multidimensional data even if the function to be optimized is not differentiable. Furthermore, it can address problems that are noisy, discontinuous or dynamic. DE works by taking a population of feasible solutions and combining them using simple mathematical operations to identify the solution that is most optimal for the optimization problem. The algorithm encodes the variables of the problem as a vector of real numbers. The population consists of vectors of length Ncorresponding to the parameters in the optimization problem20,21. A vector is represented by \({x}_{g,p}\)Where P indicates the index within the population, and G indicates his generation. The components of this vector, \({x}_{g,p,m}\) are bounded within intervals defined by \({x}_{\text{min},m}\) And \({x}_{\text{max},m}\). The DE algorithm includes four stages including initialization, mutation, recombination and selection22. The algorithm goes through these phases until a stopping criterion is met, which can be based on the number of generations, the time elapsed, or the degree of optimization achieved.23,24.

In this study, DE was used to optimize the hyperparameters of the machine learning models (SVM, ENR, ERT, BRR). The fitness function to be maximized is set to R2 score. The algorithm parameters are set as follows:

The DE algorithm iteratively adjusted the hyperparameters to minimize the objective function, which in this case was the sum of the Root Mean Squared Error (RMSE) and MAE values, taking into account their standard deviations.

Bayesian ridge regression

Bayesian statistics and ridge regression provide a robust model for analyzing regression data. This method models the relationship between independent input variables and a continuous dependent response in a flexible and robust manner25. It calculates the coefficients of a linear model by taking into account prior knowledge about the data using a prior distribution. The posterior distribution is obtained by combining this distribution with the probability function. It is used to inform the estimation of coefficients and predictions of data. The technique includes a penalty term to reduce overfitting, thus aiding generalization. The regression coefficients correspond to a normal distribution with a mean of zero and a variance determined by the hyperparameter alpha. The probability function is modeled as a normal distribution, with a linear regression determining the mean and an additional hyperparameter, lambda, controlling the variance. The main objective is to derive the most likely values ​​for the beta of the regression coefficients based on the given data and previous information. The following equation is the posterior distribution of β26:

$$p(\beta\:\mid\:X,y,\alpha\:,\lambda\:)=\text{N}(\beta\:\mid\:\mu\:,\varSigma)$ $

(7)

In this comparison µ represents the mean vector and Σ denotes the covariance matrix of the posterior distribution. The Bayesian formula is used to determine these parameters27:

$$\mu={(\lambda\:\cdot\:{X^{\prime\:}X}^{\cdot\:}+\alpha\:\cdot\:I)}^{-1} \cdot\:{X^{\prime\:}y}^{\cdot}$$

(8)

$$\varSigma={(\lambda\:\cdot\:{X^{\prime\:}X}^{\cdot\:}+\alpha\:\cdot\:I)}^{-1} $$

(9)

Where \(X_{y}^{\prime }\) represents the transposed input matrix multiplied by the output variable, \(X^{\prime } indicates the transpose of the matrix of the input variable, multiplied by itself, and \(I\) indicates the identity matrix.

Extremely randomized trees

ERT is an ensemble learning method introduced by Geurts et al.28 in 2006. Tree-based regressors consist of hierarchical rule sets that predict numerical output values. By averaging the randomized predictions from multiple decision trees, Extra-Trees improves prediction accuracy while significantly reducing computational complexity29. The extra-trees method is based on the concept of bias-variance trade-off. The use of explicit randomization of cut-points and features, along with ensemble averaging, reduces variation efficiently and to a greater extent compared to less assertive randomization tactics observed in alternative algorithms. To minimize bias, the entire original training set is used instead of bootstrap samples. The computational complexity of growing the trees, assuming balanced trees, is on the order of magnitude \({\log}N\) relative to the size of the training sample, similar to other tree growing algorithms. Furthermore, due to the straightforward nature of the node splitting procedure, the constant factor is expected to be significantly lower compared to previous ensemble methods that optimize cut-points locally.

Elastic net

Elastic Net is a powerful technique that synergistically integrates the benefits of L1 (Lasso) and L2 (Ridge) regularization techniques. L1 regularization encourages parsimony by shrinking some coefficients to exactly zero, thereby performing variable selection and improving interpretability. At the same time, T2 regularization mitigates multicollinearity problems and stabilizes the coefficient estimates by returning them to zero without completely eliminating variables30.31. The Elastic Net model introduces two important hyper-parameters: \({\upalpha}\) And \({\uplambda}\). The \({\upalpha}\) parameter controls the mix between L1 and L2 regularization. When \({\upalpha}=1\)reduces the model to Lasso regression, and when \({\upalpha}=0\)it becomes Ridge regression. The \({\uplambda}\) parameter determines the overall strength of the regularization applied to the model, with higher values ​​leading to more regularization. The objective function for Elastic Net combines the penalties of both Lasso and Ridge32:

$${\text{minimize}}\left( {\frac{1}{{2n}}\sum _{{i = 1}}^{n} \left( {y_{i} – X_{i} \cdot \upbeta } \right)^{2} + \uplambda \left( {\upalpha \sum\limits_{{j = 1}}^{p} {\left| {\upbeta _{j} } \right |} + \frac{{1 – \upalpha }}{2}\sum\limits_{{j = 1}}^{p} {\upbeta _{j}^{2} } } \right)} \right )$$

(10)

In this comparison N indicates the observations, \(\:{y}_{i}\) represents the observed output, and \({X}_{i}\) represents the vector of predictors for the i-the observation. The vector \({\upbeta}\) represents the coefficients to be determined from the data set. The term \({\upalpha}\) balances L1 and L2 penalties, while \({\uplambda}\) indicates the regularization parameter. Finding the optimal values ​​for \({\upalpha}\) And \({\uplambda}\) can be achieved using a grid search with cross-validation. This method systematically evaluates different combinations of \({\upalpha}\) And \({\uplambda}\) to identify the best performing model configuration. This approach ensures that the model is well-tuned and performs optimally on the validation set, increasing generalizability to new data. By combining the flexibility of Lasso and the power of Ridge, Elastic Net Regularization is a well-balanced method for linear regression. This is especially useful for datasets with high-dimensional features or multicollinearity.



Source link

RELATED ARTICLES
- Advertisment -
Google search engine

Most Popular