Comparative performance evaluation of machine learning models for predicting the ultimate bearing capacity of shallow foundations on granular soils - Scientific Reports - MarketAlert – Real-Time Market & Crypto News, Analysis & Alerts

By combining SHAP and PDP, this study ensures a comprehensive interpretability framework. SHAP provides granular, instance-level insights into feature contributions, while PDP offers a broader understanding of overall feature trends. Together, these techniques enhance the model’s transparency and facilitate a more informed analysis of its predictions.

Figure 8 illustrates the regression plots for the developed machine learning models. The optimal prediction line represents a perfect alignment between observed and predicted data points. Predictive accuracy improves when data points get closer to the ideal line, indicating a high correlation between actual and predicted values.

A Regression slope (RS) value greater than 0.8 is generally taken to indicate strong correlation between model-estimated values and actual values. For the present study, all models produced high RS values for training and testing phases, except for the SGD model in both phases and the NN model during testing. Of the six machine learning models that were developed and evaluated, AdaBoost, kNN, RF, and xGBoost were found to perform most reliably, showing the highest concordance between predicted and actual values.

The performance of all developed machine learning models was evaluated using five statistical metrics, with their values presented in Table 5 for both training and testing. In the training phase, all models, except for SGD, demonstrated strong predictive capabilities, achieving R² values exceeding 0.92. Similarly, in the testing phase, most models-maintained R² values above 0.83, with the exception of the NN and SGD models.

Certain models, such as the NN, exhibited high accuracy during training but did not generalize as well in testing. Conversely, some models that performed moderately well in training, such as RF, showed remarkable improvement in testing, achieving R² values of 0.931 and 0.881 for training and testing, respectively. Overall, AdaBoost emerged as the best-performing model across both phases, with R² values of 0.939 in training and 0.881 in testing, indicating its robustness and superior generalization ability.

The ranking system in Table 5 integrates multiple model evaluation metrics, including MSE, RMSE, MAE, MAPE, and R², to assess the overall performance of different models. In the training dataset, kNN outperforms other models by ranking first in all metrics, achieving the lowest cumulative rank of 5, indicating the best performance. AdaBoost follows closely with a cumulative rank of 10, while xGBoost, RF, and NN rank third, fourth, and fifth with cumulative ranks of 16, 19, and 25, respectively. SGD performs the worst, with a cumulative rank of 30. In the testing, RF and AdaBoost excel, both achieving the best cumulative rank of 9, demonstrating strong generalization capabilities. kNN and xGBoost follow with cumulative ranks of 19 and 21, respectively, while NN ranks fifth with a cumulative rank of 28. SGD again performs poorly, ranking last with a cumulative rank of 35. Notably, kNN, which performed exceptionally well in training, shows a decline in testing, suggesting potential minimal overfitting, as the decline is small. Conversely, Random Forest and AdaBoost demonstrate consistent performance across both datasets, highlighting their reliability. Overall, based on the combined ranking system of both training and testing datasets, the models can be ordered as follows: AdaBoost > kNN > Random Forest > xGBoost > Neural Network > Stochastic Gradient Descent. A common practice is to rank the models based on its performance in the testing. The model ranking based on performance in the testing phase is as; AdaBoost = RF > kNN > xGBoost > NN > SGD. This analysis underscores the importance of evaluating models on both training and testing datasets to ensure robustness and generalization.

The absolute error representation in the developed models’ predictions for the UBC of shallow foundations is illustrated in Fig. 9. The comparison of the evaluated models’ results is also shown in the figure with distinct datasets for training, and testing. Among the models, AdaBoost had the lowest average error (48.05), followed by xGBoost (53.94), RF (64.01), and KNN (65.02). NN had a slightly higher error (81.54), while SGD performed the worst with 204.16. This highlights AdaBoost as the most accurate model, with SGD showing significantly higher errors.

For maximum errors (Fig. 9), AdaBoost (771.79) and xGBoost (786.27) had the lowest values, while kNN (1502.01) and SGD (1850.20) exhibited the highest. In terms of minimum error, AdaBoost (0.00) and xGBoost (0.09) performed best, while NN (1.79) and SGD (1.17) showed slightly higher values.

Overall, AdaBoost provided the most precise predictions, while SGD demonstrated the least accuracy. The results confirm that the selected ML models can effectively predict the UBC of shallow foundations, with varying degrees of precision.

As outlined in the methodology section , four strategies were adopted to control overfitting: data randomization, an 80/20 train-test split, the selection of some ensemble models, and close monitoring of training and testing performance metrics.

The dataset was first randomized and then split using the Data Sampler widget in Orange Data Mining, with 80% allocated to training and 20% to testing, in accordance with recommendations in relevant literature and as proved in the model development section . Model generalization was assessed by comparing evaluation metrics across both datasets with a special focus on model performance in testing.

Furthermore, ensemble models such as AdaBoost, xGBoost, and RF were intentionally chosen due to their well-known resistance to overfitting. These models combine the outputs of multiple base learners, thereby reducing variance and improving generalization.

Most models demonstrated nearly consistent performance between training and testing datasets, suggesting very slight to no overfitting. For example, the AdaBoost model achieved closely aligned R² values of 0.939 (training) and 0.881 (testing), highlighting its strong generalization capability and no sign of overfitting. Similarly, RF and kNN also maintained stable performance across both phases. In contrast, the NN model showed signs of overfitting, with a high training R² (0.924) and a notably lower testing R² (0.713), indicating that it memorized the training data but failed to generalize. The SGD model, on the other hand, exhibited poor results on both datasets, indicating underfitting and a failure to learn the underlying data patterns.

Interpreting machine learning predictions is often challenging without incorporating mathematical reasoning, theoretical validation, and an understanding of the mechanisms driving the model’s outputs. To address this, this study employs Shapley Additive Explanations (SHAP) and Partial Dependence Plots (PDP) to enhance the interpretability of the developed models. These techniques provide both local and global insights into feature importance and model behavior.

Two types of SHAP plots are generated for all the developed models: the Mean SHAP Value Plot and the SHAP Summary Plot. The Mean SHAP Value Plot (Fig. 10) illustrates the average impact of input parameters on the predicted UBC across various machine learning models. Foundation depth (D) exhibits the highest SHAP value in most models, reaffirming its dominant influence. However, AdaBoost ranks the angle of internal friction (ϕ) as the most influential factor, followed by D, following the order ϕ > D > B > γ > L/B. This aligns with previous studieswhich identified ϕ as the most critical geotechnical parameter and D as the most significant geometric factor. Given its alignment with prior studies and experimental findings, AdaBoost effectively captures the most accurate trend, resulting in superior predictive performance. This study highlights a limitation in R. Zhang et al.’sranking of parameters (ϕ > B > D > γ > L/B), which contradicts Meyerhof’s findings. The results from this research show that D has a more significant impact than suggested by Zhang et al., aligning more closely with Meyerhof’s framework and reinforcing the need for a reevaluation of parameter significance in UBC predictions.

xGBoost, NN, RF, and K-Nearest kNN prioritize D over ϕ, following the order D > ϕ > B > γ > L/B. In contrast, SGD and NN exhibit distinct trends, ranking parameters as D > ϕ > B > L/B > γ and D > γ > B > ϕ > L/B, respectively. The lower ranking of γ in SGD suggests its limited contribution to linear regression-based models. Conversely, NN assigns greater importance to γ, indicating that deep learning models capture complex interactions between γ and other parameters, which traditional models may overlook. These findings reinforce the dominance of D and ϕ in bearing capacity estimation while demonstrating how different ML models interpret feature importance with varying sensitivities.

To investigate the impact of input parameters on the predicted UBC, the SHAP summary plot was employed to illustrate each feature’s contribution to the model output, as depicted in Fig. 11. Red points represent higher feature values, while blue points indicate lower ones; a positive SHAP value signifies a favorable effect on UBC, and a higher absolute value denotes stronger influence. Conversely, negative SHAP values reduce UBC, highlighting detrimental effects of specific parameter ranges.

In the xGBoost model, foundation depth (D) and friction angle (ϕ) exhibit the widest SHAP distributions, emphasizing their dominant roles. Foundation width (B) exerts a moderate impact, whereas unit weight (γ) and length-to-width ratio (L/B) show narrower spreads, suggesting lesser influence. Positive SHAP values generally increase UBC, aligning with established geotechnical insights that underscore the synergy between soil properties and foundation geometry.

In the AdaBoost model, ϕ emerges as the most influential parameter, followed closely by D, indicating a balanced interplay between soil friction and foundation depth. B and γ have moderate effects, while L/B remains least critical. This ordering supports prior literature emphasizing ϕ as a pivotal soil property, and the model’s sensitivity to friction angle aligns with its robust predictive accuracy.

In the kNN model, D and ϕ dominate the SHAP distribution, confirming their importance. B demonstrates moderate influence, whereas γ and L/B appear comparatively minor. Red regions at higher SHAP values typically increase UBC, reflecting the local, distance-based nature of kNN, which captures interactions between soil and geometric factors to varying degrees.

In the NN model, ϕ has the largest spread of SHAP values, highlighting its primary role, while D also contributes substantially. B and γ display moderate effects, and L/B remains marginal. Positive SHAP values shift UBC upward, illustrating the NN’s capacity to model nonlinear relationships between soil properties and foundation characteristics.

In the RF model, D yields the greatest SHAP range, followed by ϕ, indicating a balanced focus on geometry and soil friction. B shows moderate significance, whereas γ and L/B exhibit narrower distributions. Positive SHAP values raise UBC, mirroring ensemble-based insights that consistently emphasize D and ϕ as key drivers in bearing capacity estimation.

In the SGD model, D again ranks highest, with ϕ next in importance, but γ, B, and L/B display narrower spreads. Positive SHAP values correlate with increased UBC, while negative ones reduce it. The linear optimization framework of SGD may limit its ability to capture more complex interactions, accounting for its distinct feature hierarchy relative to other models.

The SHAP analysis clearly identified the angle of internal friction (φ) as the most influential parameter across the developed models, particularly in AdaBoost, where it contributed most significantly to the model’s predictive output. This is consistent with classical bearing capacity theories for cohesionless soils, where φ directly governs shear strength and, consequently, the ultimate bearing capacity (UBC). A higher φ enhances resistance along potential failure surfaces, making it a critical parameter in both empirical and data-driven models. Conversely, the length-to-width ratio (L/B) exhibited the lowest SHAP values in all models, indicating minimal influence on the predicted UBC. While L/B may affect stress distribution patterns, its indirect effect renders it less impactful relative to soil strength and geometric parameters such as φ, D, and B. This ranking of features not only enhances model interpretability but also offers practical insights, underscoring the need to prioritize accurate assessment of φ in field investigations while suggesting that simplifications in L/B may be acceptable in preliminary design phases.

Partial dependence plots (PDPs) illustrate how variations in input parameters influence a model’s predicted output. These plots provide insights into how different machine learning models capture trends between input variables and the target variable. The PDPs for all models are presented in Fig. 12, and their interpretations are summarized as follows:

All models exhibit a nonlinear, nearly exponential increase in UBC with increasing D, confirming that deeper foundations generally provide greater resistance. This behavior aligns with both theoretical and physical expectations and is further substantiated by SHAP analysis, which indicates a consistently positive impact of depth on UBC. Notably, the AdaBoost model initially follows this trend but eventually stabilizes, suggesting a diminishing marginal influence of depth beyond a certain threshold, a phenomenon that may be attributed to soil confinement effects or bearing capacity limits.

The influence of the angle of internal friction (φ) follows a sharp nonlinear increasing trend across most models, indicating its critical role in governing bearing capacity. The AdaBoost model, however, displays a distinct pattern, remaining constant up to a specific threshold before gradually increasing, followed by a sudden surge. This behavior suggests that when cohesionless soil behavior dominates, the contribution of φ to bearing capacity becomes highly pronounced, potentially approaching an exponential relationship. These observations align with established theoretical frameworks and previous empirical studieswith SHAP analysis further confirming the significant contribution of φ to ultimate bearing capacity.

Foundation width (B) exhibits a generally increasing trend, demonstrating that wider foundations contribute to higher bearing capacity. However, some models indicate a plateauing effect at larger widths, suggesting diminishing returns beyond a certain threshold. This behavior aligns with established geotechnical principlesas increasing width enhances UBC primarily by increasing the effective stress distribution, but excessive width may lead to settlement effects that counterbalance the gains. These findings highlight the need to consider optimal width (B) dimensions in foundation design to maximize efficiency.

The effect of unit weight of soil (γ) on UBC is relatively moderate compared to other parameters. Except for the NN model, all models depict a steady but gradual increase, indicating that while higher γ contributes positively to bearing capacity, its impact remains less pronounced. This aligns with theoretical expectations, as UBC is more dominantly controlled by parameters such as B and φ. The comparatively weaker influence of γ is likely due to its indirect role in influencing stress distribution rather than directly governing failure mechanisms.

The length-to-width ratio (L/B) exhibits a consistent decreasing trend across all models, reinforcing theoretical predictions that elongated foundations experience reduced UBC. This trend can be attributed to stress redistribution effects, where an increased L/B ratio leads to a more uniform load distribution but reduces the confining effects that contribute to higher resistance. These findings align with existing literaturesuggesting that optimizing L/B is crucial in foundation design to achieve an efficient balance between load-bearing performance and structural stability.

The predictive performance of the proposed AdaBoost model was compared with empirical, statistical, and machine learning-based models from the literature using key error metrics, including RMSE, MAE, Correlation Coefficient (R) (Table 6). The results indicate that AdaBoost outperforms classical empirical models, such as Terzaghi, Meyerhof, Hansen, and Vesic, which rely on simplified assumptions and exhibit higher RMSE and MAE values.

Compared to Khorrami-M5 and Zhang & Xue-MEPAdaBoost outperformed both models with lower RMSE and MAE, and higher Rmaking it a more reliable and stable alternative. Omar-ANN has lower R values than AdaBoost, although it has achieved lower RMSE and MAE, required extensive hyperparameter tuning, making AdaBoost a more practical and efficient option. The AdaBoost model ranked second overall among machine learning models, following Kumar-ANN-ICAwhich achieved the lowest error values. However, AdaBoost exhibited a high R² value in training (96.90%), beating Kumar ANN-ICA in this metric.

Comparative performance evaluation of machine learning models for predicting the ultimate bearing capacity of shallow foundations on granular soils – Scientific Reports

Like this:

Related

Share this:

Like this:

Related

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.