Skip to main content

Machine learning models for optimization, validation, and prediction of light emitting diodes with kinetin based basal medium for in vitro regeneration of upland cotton (Gossypium hirsutum L.)

Abstract

Background

Plant tissue culture has emerged as a tool for improving cotton propagation and genetics, but recalcitrance nature of cotton makes it difficult to develop in vitro regeneration. Cotton's recalcitrance is influenced by genotype, explant type, and environmental conditions. To overcome these issues, this study uses different machine learning-based predictive models by employing multiple input factors. Cotyledonary node explants of two commercial cotton cultivars (STN-468 and GSN-12) were isolated from 7–8 days old seedlings, preconditioned with 5, 10, and 20 mg·L−1 kinetin (KIN) for 10 days. Thereafter, explants were postconditioned on full Murashige and Skoog (MS), ½ MS, ¼ MS, and full MS + 0.05 mg·L−1 KIN, cultured in growth room enlightened with red and blue light-emitting diodes (LED) combination. Statistical analysis (analysis of variance, regression analysis) was employed to assess the impact of different treatments on shoot regeneration, with artificial intelligence (AI) models used for confirming the findings.

Results

GSN-12 exhibited superior shoot regeneration potential compared with STN-468, with an average of 4.99 shoots per explant versus 3.97. Optimal results were achieved with 5 mg·L−1 KIN preconditioning, ¼ MS postconditioning, and 80% red LED, with maximum of 7.75 shoot count for GSN-12 under these conditions; while STN-468 reached 6.00 shoots under the conditions of 10 mg·L−1 KIN preconditioning, MS with 0.05 mg·L−1 KIN (postconditioning) and 75.0% red LED. Rooting was successfully achieved with naphthalene acetic acid and activated charcoal. Additionally, three different powerful AI-based models, namely, extreme gradient boost (XGBoost), random forest (RF), and the artificial neural network-based multilayer perceptron (MLP) regression models validated the findings.

Conclusion

GSN-12 outperformed STN-468 with optimal results from 5 mg·L−1 KIN + ¼ MS + 80% red LED. Application of machine learning-based prediction models to optimize cotton tissue culture protocols for shoot regeneration is helpful to improve cotton regeneration efficiency.

Introduction

Cotton (Gossypium hirsutum L.), a prominent crop used for both oil and fiber, is a member of Malvaceae family. It is regarded as the mainstay of the economies of several nations throughout the world, supplying raw materials largely to the textile and oil sectors (Juturu et al. 2015; Jabran et al. 2019). Cotton is cultivated in around 75 countries or regions, and faces similar types of issues, such as heavy infestation of insect pests, disease outbreaks, and other abiotic stresses (Bakhsh et al. 2015; Zafar et al. 2024). The recent advancements in the field of biotechnology and molecular biology have enabled researchers to improve the germplasm by developing resistant varieties against the aforementioned issues (Nadeem et al. 2023).

Using biotechnology, researchers around the world have developed transgenic cotton against insect pests and herbicides (Brookes et al. 2012). Other traits due to the use of biotechnological tools include enhanced fiber quality (Chen et al. 2015). However, the standardization of genetic transformation protocol highly depends on the optimization of in vitro regeneration protocol as cotton is considered one of the most recalcitrant crops under in vitro conditions due to low response and non-reproducible protocols (Kumari et al. 2017; Rajasekaran 2004; Yavuz et al. 2020). Although several protocols utilizing various explants and plant growth regulators (PGR) have been developed recently, researchers are still having difficulty in finding an effective and repeatable strategy that induces multiple shoots. The in vitro regeneration studies on cotton demonstrated somatic embryogenesis, embryogenic cell cultures, and organogenesis from meristematic regions (Rajasekaran 2004).

Meristem transformation protocols offer the unique ability to change individuals without regard to their genotypes, with the ability to generate chimeras in large quantities. Slightly fewer cells survive in regenerated tissues due to the limited quantities of meristematic cells, which lead to low transgenic selection (Firoozabady et al. 1987; Sunilkumar et al. 2001). However, the procedure is still time-consuming, and the time it takes to find transformants is influenced by a cultivar's capacity to regenerate multiple shoots (Keller et al. 1997). The following generation under this circumstance only demonstrates achievement in transformation (John 1997). On the other hand, with the benefit of a short period of culture, embryogenic lines can be routinely subcultured after establishment and used for multiple transformations. In this way, there is a possibility of converting more transformants into regenerated plantlets (Leelavathi et al. 2004). One of the limiting factors for cotton in vitro regeneration is the explant; previous studies emphasized the need to exploit the full potential of the explant (Sunilkumar et al. 2001). Some explants used for cotton in vitro regeneration include leaf petiole (Zhang et al. 2011; Yang et al. 2014), hypocotyl (Kumar et al. 2013), immature zygotic embryos (Hussain et al. 2009), and cotyledonary nodes (Bakhsh et al. 2016). Selection of proper explants with proper PGR and culture conditions is also highly crucial for establishing a protocol due to the phenolic compound’s elucidation in the culture medium inhibiting the in vitro regeneration of cotton.

Precision agriculture is increasingly relying on artificial intelligence (AI)-based algorithms in agricultural and biological sciences (Sharma et al. 2020; Whitmire et al. 2021) with very limited use in plant biotechnology, especially in plant tissue culture (Salehi et al. 2021; Aasim et al. 2024a) as compared with other research areas. In recent times, researchers used different models in plant tissue culture for optimizing in vitro germination, sterilization, organogenesis, and somatic embryogenesis (Jafari et al. 2023; Özcan et al. 2023; Şimşek et al. 2024). The selection of hyperparameters and input and output relationships largely influenced the performance and choice of models used in these investigations. Several studies used random forest (RF), extreme gradient boost (XGBoost), and multilayer perceptron (MLP) algorithms along with various performance metrics to validate their results (Katirci et al. 2021; Yan et al. 2020).

In this study, a novel protocol was established by exposing cotton explants to high cytokinin concentration (preconditioning) followed by postconditioning on a medium with no or low cytokinin. Furthermore, explants were cultivated for in vitro regeneration and rooting under various light emitting diodes (LED) lighting, and the outcomes were predicted using AI models. To precisely estimate shoot counts and validate the experimental findings, integrating cutting-edge AI models, such as artificial neural networks (ANN), RF, and XGBoost, was a major goal. With the combination of AI-driven predictive modeling and growth condition optimization, this method offers a fresh framework for enhancing tissue culture techniques.

Material and methods

In vitro regeneration

The experiment was conducted at The Biotechnology laboratory of Karamanoglu Mehmetbey University, Faculty of Science, Karaman, Türkiye. Two different commercial upland cotton cultivars named GSN-12 (May Agro Seeds Pvt Ltd, Türkiye) and Stoneville 468 (STN-468, Nazili Cotton Research Institute, Nazili, Türkiye) were procured. The seeds of both cultivars were delinted by using the standard procedure of treating seeds with concentrated H2SO4 followed by washing with water. The surface sterilization of seeds was performed following the protocol as described previously (Bakhsh et al. 2016). The seeds were exposed to 1% HgCl2 for 10 min, followed by immediate rinsing for 5 min with sterile water for one time only. Following a 10-min treatment with 0.2% sodium dodecyl sulfate (SDS, Sigma-Aldrich) and 0.2% HgCl2, the seeds were washed 3 times for 5 min each time with water. Surface-sterilized seeds were inoculated on Murashige & Skoog (MS) medium solidified with agar for 7–8 d to obtain cotyledonary node explants. After isolation, explants were preconditioned on phytagel-solidified MS medium (Murashige et al. 1962) augmented with different concentrations (5, 10, and 20 mg·L) of kinetin (KIN) for 10 d cultured under 80.0% R-LEDs (R:B = 4:1 LEDs) lights. Subsequently, explants were cultured on a postconditioned medium containing full MS (4.4 g·L−1), ½ MS (2.2 g·L−1), and ¼ MS (1.1 g·L−1) without provided with any PGR. Whereas MS + 0.05 mg·L KIN was also used as a fourth postconditioned medium. Postconditioned mediums with explants were cultured under different red and blue (R:B) LEDs combinations in the growth room. The R:B LED combinations used in this study were 80.0% R-LEDs, 75.0% R-LEDs (R:B = 3:1), and 66.67% R-LEDs (R:B = 2:1). Whereas light photoperiod (16 h) and growth room temperature (24 °C ± 1 °C) were maintained for all culture conditions. Rooting was done by separating the shoots, followed by culture on MS media containing naphthalene acetic acid (NAA, 0.1 mg·L−1), and activated charcoal (1 g·L−1). Pots filled with organic peat moss were used for the plant establishment.

The culture media employed in this investigation were made according to the standard protocol containing 30 g·L−1 sucrose with MS at different strengths. Preconditioned culture medium was solidified with agar (6.5 g·L−1), and phytagel (2.5 g·L−1) was used as a gelling agent. All culture mediums used in this study were adjusted at an approximate pH of 5.8 by utilizing a 1 mol·L−1 solution of HCl or NaOH. The filter-sterilized (0.22 µm) KIN was added after autoclaving the culture medium.

Statistical analysis

In this study, six explants per replication were used, and the experiment was repeated twice. The comparison of shoot numbers for different cultivars, preconditioning doses, postconditioning doses, LEDs, and interaction of all individual cultivars was analyzed by ne-way analysis of variance (ANOVA) and factorial regression analysis. Minitab 20.4 program was used for the analysis, and the difference between the means was compared by Tukey’s test. Pareto charts, normal plots, and response optimizers were used for optimizing the input variables by using Minitab 20.4. To illustrate the association between the independent and dependent variables, 2-D contour plots and 3-D surface plots were created using the statistical software Design Expert.

Machine learning (ML) modeling

In this study, preconditioning medium, postconditioning medium, and LEDs were used as input variables for two different cotton cultivars, and shoot count was used as the output variable for ML analysis. Three different models named XGBoost, RF, and MLP were used as AI models for data analysis (Chen et al. 2016; Aggarwal 2018; Silva et al. 2019).

In the field of data science, RF algorithm is the most popular advanced decision tree model due to its extraordinary accuracy, speed, stability, and usability, especially for indicating non-linear relationships. Bagging, also known as bootstrap aggregation, is a technique used by RF that trains many decision trees simultaneously and aggregates their predictions to enhance the performance of the model as a whole to predict the output, which is expressed in equation (1) (Pavlov 2019). This process involves creating multiple bootstrap samples from the training data, training individual models on these samples, and then combining their predictions.

$$\widehat{y}=\frac{1}{M}\sum_{i=1}^{M}{f}_{i}\left(x\right)$$
(1)

where, Å· is the predicted value, f i (x) is the prediction of the i th tree for input x, and M represents the total number of trees.

XGBoost, another decision-tree-based ML algorithm, was employed for regression, classification, and ranking types of supervised learning tasks (Chen et al. 2016). By incorporating trees into the earlier models, the prediction error is decreased. Equations (2) and (3) present the XGBoost objective function and show the model that must be minimized at the j th iteration of XGBoost.

$${y}_{i}=F\left({x}_{i}\right)=\sum_{d=1}^{D}{f}_{d}\left({x}_{i}\right), {f}_{d}\in F, i=1,\dots ,n$$
(2)

The above equation shows a tree ensemble model that uses D additive function to predict the output. Here, yi is the predicted output; F(xi) is the ensemble of decision trees; each fd corresponds to an independent tree structure.

$${L}_{j}=\sum_{i=1}^{n}l\left[{y}_{i}, {\widehat{y}}_{i}^{\left(j-1\right)}{+ f}_{j}\left({x}_{i}\right)\right]+\Omega \left({f}_{j}\right)$$
(3)

where \({L}_{j}\) is the objective function at iteration j that needs to be minimized, l is a function of classification or regression trees (CART) learners composed of a sum of the current and previous additive trees, ŷ i (j−1) is the prediction up to (j-1) trees, \({f}_{j}\left({x}_{i}\right)\) represents the prediction of the j th tree for the i th data point, and \(\Omega\) is the regularization term.

With multiple perceptrons and using the deep and feedforward method, MLP is a preferred ANN model. The whole system is dependent on three primary components or layers (input, output, and hidden). Using the backpropagation technique, the weights and biases are modified in relation to the error (Katirci et al. 2021), and data training keeps going till the next equation is minimized [equation (4)].

$$E= \frac{1}{K} \sum_{k=1}^{K}{({y}_{k}-{\widehat{y}}_{k})}^{2}$$
(4)

where E is the error, y k is the value of data point k, \({\widehat{y}}_{k}\) is the predicted value of data point k, and K is the sample size.

The data, which was composed of 216 data points, were divided into two different sets of training and testing with the aid of leave-one-out cross-validation (LOO-CV) methodology (Webb et al. 2011). The optimization of hyperparameters was performed using Grid search for the optimal model. Using the free and open-source Python programming language (Van Rossum et al. 2009) and the sklearn package (Pedregosa et al. 2011), all of the supervised ML algorithms were developed.

The performance metrics used to include coefficient of determination (R2), which exhibits the strength of the association between the dependent variables and the model used, and its values vary from 0 to 1. Mean absolute error (MAE) indicates the average magnitude of the deviations between an observation's predicted value and its actual value. Mean squared error (MSE) shows the distance between a regression line and the observed data points. Mean absolute percentage error (MAPE) is a forecasting system for prediction accuracy. The ratio between the actual and anticipated values is calculated using mean squared logarithmic error (MSLE), whereas median absolute error (MedAE) compares the actual observed reaction with the anticipated response (0, ∞). The performance metrics listed above are all represented mathematically in the following equations

$${R}^{2}=1- \frac{{\sum }_{i=1}^{n}{({Y}_{i}-{\widehat{Y}}_{i})}^{2}}{{\sum }_{i=1}^{n}{({Y}_{i}-\widetilde{Y})}^{2}}$$
(5)
$$MSE= \frac{1}{n} {\sum }_{i=1}^{n}{({Y}_{i}-{\widehat{Y}}_{i})}^{2}$$
(6)
$$MAE= \frac{1}{n} \sum_{i=1}^{n}\left|{Y}_{i}-{\widehat{Y}}_{i}\right|$$
(7)
$$MAPE= \frac{1}{n} \sum_{i=1}^{n}\left|\frac{{Y}_{i}-{\widehat{Y}}_{i}}{{Y}_{i}}\right|\times 100$$
(8)
$$MSLE= \frac{1}{n} \sum_{i=1}^{n}{\left(\text{log}\left({Y}_{i}+1\right)-\text{log}\left({\widehat{Y}}_{i}+1\right)\right)}^{2}$$
(9)
$$MedAE= median\left(\left|{Y}_{1}-{\widehat{Y}}_{1}\right|,\dots ,\left|{Y}_{n}-{\widehat{Y}}_{n}\right|\right)$$
(10)

where Y i is the actual value, \({\widehat{Y}}_{i}\) is the predicted value, \(\widetilde{Y}\) is the mean of actual values, log(x) is the natural logarithm of x, n is the sample size.

Additionally, prior to testing and training of the models, all numerical inputs were scaled by using the formula (11).

$${X}^{\prime}=\frac{{X}_{i}-\mu }{\sigma }$$
(11)

where X’ is the standardized value, X i is the actual data, μ is the mean of the feature values, and σ is the standard deviation of the feature values.

Results

In vitro regeneration

A protocol for in vitro propagation of two commercially grown upland cotton cultivars from Türkiye has been developed, resulting in enhanced shoot regeneration. In this study, preconditioning explants placed oriented at 30–60° showed high shoot regeneration frequencies, with single shoot induction occurring within 2–3 weeks and multiple shoots forming within 4–5 weeks, achieving 100% shoot regeneration for both cultivars. The use of phytagel as a gelling agent further improved the outcomes, reducing phenolic compound leakage and promoting a prolonged culture period without subculturing. The kind of cultivar has a significant impact on the in vitro regeneration of recalcitrant crops along with other factors. Cotton's in vitro regeneration is heavily genotype-dependent, with several cultivars showing comparatively low shoot numbers. The two different cultivars used in this study responded in a variable way (P = 0.000), and an average of 4.99 shoots were attained from GSN-12 compared with STN-468, which produced 3.97 shoots per explant (Table S1).

The approach of preconditioning explants with higher cytokinin levels and culturing on a basal medium with varied MS concentrations and KIN was crucial in inducing multiple shoots without callus formation. Exposure of explants to a high KIN-containing medium exerted a positive but statistically insignificant impact on shoot regeneration frequency (100%) and mean shoot counts (P = 0.095). The highest mean shoot count (4.68) were attained from media supplemented with 10 mg·L−1 KIN followed by 5 mg·L−1 (4.54 shoots) and 20 mg·L−1 (4.22 shoots). Results revealed the insignificant (P = 0.077) but clear impact of postconditioning with different MS medium concentrations on shoot induction, with the mean shoot count in order of ¼ MS (4.68) ≥ ½ MS (4.57) ≥ MS (4.09). Optimized culture conditions, including red-to-blue LED combinations, further supported shoot proliferation. The three distinct R:B LED combinations employed in this investigation had a discernible effect on in vitro propagation of cotton. Results revealed the need for the relatively low level of B-LEDs in combination with R-LEDs to induce a maximum number of shoots. The mean shoot counts in response to R:B LEDs were 80.0% R-LED (5.11) > 75.0% R-LED (4.38) > 66.67% R-LED (3.95). The comparison of individual parameters like cultivar, KIN dosage for preconditioning, culture medium, and LED revealed that GSN-12 cultivar, 10 mg·L−1 KIN, ¼ MS, and 80.0% R-LED were superior and yielded more mean shoot counts compared with their respective treatments (Table S1).

The results of individual input variables were also evaluated by constructing the boxplot. In Fig. 1a, the boxplot shows a higher median value for the shoot counts of GSN-12 compared with STN-468, whereas the spread between the two cultivars is similar. Figure 1b illustrates the highest median value achieved by the preconditioning of 10 mg·L−1 KIN and has the highest spread for the shoot counts. Figure 1c shows ½ MS has the highest median value, the presence of a relatively large number of extreme values in the upper whisker of ¼ MS, and low spread makes the mean shoot count of ¼ MS larger than others for postconditioning. Finally, the 80% R-LED arrangement demonstrates the highest median values together with the spread on the shoot counts (Fig. 1d).

Fig. 1
figure 1

Boxplot depicting the influence of individual input variables on shoot count in cotton (a) cultivar; (b) preconditioning; (c) postconditioning, 0.05-KIN indicate MS + 0.05 mg·L−1 KIN; 1.10-MS, 2.20-MS, and 4.40-MS indicate 1.10, 2.20, and 4.40 g·L−1 MS, respectively; (d) LED

Table 1 presents the combined impact of all the parameters (cultivar, KIN preconditioning dosage, culture medium, and LED) on cotton shoot count, and the results were statistically significant (P = 0.000). For GSN-12, the maximum shoot count (7.75) was attributed to the combination of 5 mg·L−1 KIN × ¼ MS × 80.0% R-LED, more shoots (5.75) were attributed to 10 mg·L−1 KIN × ¼ MS × 80.0% R-LED combination. On the other hand, STN-468 responded differently, and the maximum shoot count was attributed to the combination of 10 mg·L−1 KIN × MS with 0.05 mg·L−1 KIN × 75.0% R-LED. Similarly, both cultivars induced minimum shoots under different combinations, and recorded as 3.00 shoots for GSN-12 under 20 mg·L−1 KIN × MS with 0.05 mg·L−1 KIN × 66.67% R-LED. Whereas, minimum shoot count (2.25) of STN-468 were linked with a combination of 10 mg·L−1 KIN × MS × 75.0% R-LED. These results enlighten the significance of cultivar, culture medium, and culture condition on in vitro shoot induction of cotton.

Table 1 Effect of cultivars,preconditioning dosage, culture medium, and LED on in vitro shoot count of cotton

Factorial regression analysis

Factorial regression analysis was used for investigating the impact of input variables on shoot counts by considering the significant factors and their level by constructing Pareto charts and normal plots. Results of the Pareto chart revealed the fitted line score of 1.984 for both cultivars (Fig. 2a, b). Results of cotton STN-468 revealed the significant impact of postconditioning (B), LED (C), and preconditioning × post-conditioning (AB). Whereas, preconditioning (A), ABC, AC, and BC were non-significant and showed scores of less than 1.984. Considering the significance order, it was registered as B > C > AB > A > ABC > AC > BC (Fig. 2a). On the other hand, a totally different pattern was registered for GSN-12; the order of C > A > AC > BC > B > AB > ABC was registered, and only C and A input variables were significant and exhibited similar impact on shoot counts (Fig. 2b). The significance shown by Pareto chart was further investigated by normal plots (Fig. 2c, d), which reflects the significance in terms of the relationship (direct proportional or inverse proportional impact) between input variables and respective output variables. Results of STN-468 illustrated the positioning of B and AB variables on the left side of the fitted line (Fig. 2c). Whereas factor C was positioned on the right top of the line with a significant level adjusted at around 90.0%. The significance level in terms of percentage was 20.0% for AB and 100% for the B variable, reflecting the weightage of given input parameters on the shoot counts. On the contrary, C and A factors were significant and placed on the right and left side of the standard line for GSN-12, respectively (Fig. 2d).

Fig. 2
figure 2

Pareto chart (a-b) and normal plot (c-d) analysis of shoot counts of in vitro regenerated cotton

Results of contour plot and surface plot of interaction of two input factors for STN-468 revealed 5.0–5.2 shoot counts from 17.6–20.0 mg·L−1 preconditioning KIN × 0.05–0.40 mg·L−1 postconditioning KIN (Fig. 3a, d). The combination of AC (preconditioning KIN × LED) optimized the maximum shoot count of 4.4–4.5 from 14.5–20.0 mg·L−1 preconditioning KIN × 78−80% R-LED (Fig. 3b, e), whereas BC combination was optimized as 4.5–5.0 shoots from 0.05–1.72 mg·L−1 postconditioning KIN × 73.0−80.0% R-LED (Fig. 3c, f). Investigation of GSN-12 optimized the shoot count range of 4.5–4.8 from 6.25–14.0 mg·L−1 preconditioning KIN × 0.05–4.44 mg·L−1 postconditioning KIN (Fig. 4a, d). The combination of AC was registered as 5.0–16.0 mg·L−1 preconditioning KIN × 79−80% R-LED for maximum shoot count of 6.0–6.5 (Fig. 4b, e). A similar number of shoot counts can also be attained from 0.05–4.44 mg·L−1 postconditioning KIN × 79−80% R-LED (Fig. 4c, f). It is evident from the results that exposing cotton to 80% red + 20% blue lighting leads to maximum shoot counts.

Fig. 3
figure 3

Contour plots (a-c) and surface plots (d-f) for in vitro regenerated shoot counts of STN-468

Fig. 4
figure 4

Contour plots (a-c) and surface plots (d-f) for in vitro regenerated shoot counts of GSN-12

Results were further optimized by using a response optimizer statistical tool for optimizing the best input condition for achieving maximum shoot count. All three input parameters were optimized. Results revealed a similar requirement of postconditioning of MS with 0.05 mg·L KIN and 80.00% R-LED to induce a maximum shoot count of 5.83 for STN-468, and 6.35 for GSN-12, respectively (Table 2). However, the preconditioning dose requirements were different for the two cultivars (20 mg·L−1 KIN for STN-468, and 5.0 mg·L−1 KIN for GSN-12). The results showed clearly that all statistical tools optimized the better performance of 80.0% R-LED for yielding maximum shoot counts for both cultivars.

Table 2 Optimizing input variables for individual cultivars using Response optimizer

Application of ML modeling

The data generated were thereafter validated and predicted by ML models. R2 for the tested models were very close and ranged from 0.69 to 0.71, with the maximum R2 (0.71) recorded from the MLP model, followed by RF (0.70) and XGBoost (0.69). The results of actual and predicted scores of shoot counts are presented in Fig. 5. A 1:1 line in ML is used in regression analysis to compare the predicted values with actual values. It represents a perfect agreement between the predicted and actual outcomes. Deviations from this line indicate prediction errors. The line is also used in residual analysis to visualize how well a model fits the data. The performance metrics of different models ranged from 0.477 to 0.515 for MSE, 0.372 to 0.414 for MAE, 10.116% to 10.976% for MAPE, 0.0194 to 0.0204 for MSLE, and 0.078 to 0.146 for MedAE (Table 3). The minimum scores for MSE and MSLE were attributed to the MLP model, whereas, minimum MAE, MAPE, and MedAE were associated with RF model. Overall, MLP exhibited better performance, and all three models exhibited very similar performance. The RF model outperformed the MLP and XGBoost models in terms of data prediction and validation.

Fig. 5
figure 5

Actual and predicted values of shoot count of different ML models (a) MLP, (b) XGBoost, and (c) RF model

Table 3 Performance metrics for the validation of ML models

Discussion

The establishment of in vitro propagation protocol is regulated by the combinations of variable physical, chemical, and biological factors. The selection of proper cultivar, culture medium, and culture conditions are direly decisive for establishing successful and repeatable in vitro regeneration protocol of recalcitrant plants (Wang et al. 2011; Parris et al. 2012). Cotton is believed to be one of the most recalcitrant plants to manipulate, as multiple shoot induction in cotton is challenging (Pathi et al. 2013). Therefore, it is always a priority to induce reproducible and efficient multiple shoots for cotton breeding programs (Khan et al. 2023). Optimization of protocol can be achieved by using traditional statistical tools or employing modern optimizing tools like AI-based models. In this study, a novel protocol with multiple shoot induction in cotton followed by successful rooting and acclimatization of two commercially grown cotton cultivars in Türkiye was established by exposing explants to high KIN concentrations. Thereafter, different culture conditions were optimized by using various optimizing tools like contour plots, surface plots, and response optimizers. In the end, the attained data were validated and predicted through ANN and ML models.

Explant orientation (the placement of explant and contact with a culture medium) is highly significant, but neglected factors that regulate in vitro regeneration (Bhatia et al. 2005; García-Luis et al. 2006). The preconditioned explants with KIN were placed on the culture medium in a special position (angelized position of 30–60°) rather than placing explants horizontally or vertically. The reason behind placing it in a certain position was the initial observations taken in other experiments (data not provided), which resulted in minimum necrosis. For a total of eight weeks, the explants were continually grown on the same medium. There is the possibility of hindering metabolite movements and minimizing the leakage of phenolic compounds in the culture medium due to explant orientation, which in turn leads to high shoot regeneration frequency. A study by García-Luis et al. (2006) revealed the significant impact of explant orientation on callus growth and shoot induction of Troyer citrange. Similarly, the positive impact of explant orientation for the inhibition of phenolic compounds has been documented for pistachio by placing explants at a 60° angle (Nezami et al. 2015).

The incorporation of gelling agents in the culture media is important for in vitro regeneration and multiple shoot induction. Agar is the most preferred gelling agent; however, certain issues like impurities, growth inhibitory compounds, and vitrification are associated with it in plant tissue culture (Nairn et al. 1995). The incorporation of phytagel as an alternative gelling agent can address these concerns owing to its high ash content and low impurities (Huang et al. 1995). Relatively low leakage of phenolic compounds was observed in the culture medium, which in turn resulted in prolonged culture with avoidance of the subculture. The positive impact of phytagel might be due to the provision of a more suitable environment (hydration and nutrition) to explants and controlling the leakage of phenolic compounds in the regeneration medium (Kumar et al. 2003).

The physical state of the explants prior to culture controls the entire in vitro regeneration process. In this research, explants were initially exposed to higher KIN concentrations, followed by culturing on the basal medium (different concentrations of MS and KIN) under different R:B LED combinations. Preconditioning (pretreatment) is exposing explants to higher cytokinin levels for a certain period. Using this approach will cause recalcitrant crops to regenerate more quickly and induce multiple shoots (Kumari et al. 2017). The effectiveness of the technique is the rapid and more cell division of explants at the initial stage, followed by shoot induction on media supplemented without any PGR or containing low concentrations of cytokinin or cytokinin-auxin combination. There was no sign of callus induction on the preconditioned with a high concentration of KIN. Contrarily, callus induction from the basal end of the explants in response to preconditioning has been documented in other crops like chickpeas (Aasim et al. 2013). However, other factors also regulate the whole morphogenesis. In this study, single shoot induction from explants in both cultivars was initiated simultaneously after approximately 2–3 weeks and it took 4–5 weeks for multiple shoots, resulting in 100% shoot regeneration frequency. Previous studies also revealed that preconditioning had no negative impact on regeneration frequency (Kumari et al. 2017). The findings demonstrate the usefulness of preconditioning doses in producing multiple shoots and corroborate with the earlier research (Tang et al. 2012; Kumari et al. 2017).

Results further illustrated the significance of genotypes, as the in vitro regeneration performance of GSN-12 was better than STN-468. The results confirmed previous studies that emphasized the role of cotton genotype or cultivar in regulating in vitro regeneration (Sakhanokho et al. 2004; Khan et al. 2010; Pushpa et al. 2010). However, proper selection of PGR with relative dose and exposure time along with explants also plays a role. KIN is a naturally found cytokinin used for inducing in vitro regeneration at relatively low concentrations. The type and concentration of the basal medium along with PGR are the prerequisites for inducing multiple shoots, especially for recalcitrant crops like cotton. The culture medium (postconditioned medium) for preconditioned explants is highly significant for inducing multiple shoots with high regeneration frequency. Therefore, postconditioned mediums are generally enriched with low cytokinin or cytokinin-auxin combinations (Aasim et al. 2013). In this research, preconditioned explants were kept on media supplemented with variable MS concentrations without any PGR, and MS medium with a minute amount of KIN (0.05 mg·L−1). Concentration of the MS culture medium also regulated the in vitro regeneration behaviors of cotton, and a significant impact was noted. On the contrary, a negative impact of low MS concentration on shoot induction has been documented for Phytolacca dioica (El-Afry et al. 2017), Ophiorrhiza prostrata (Gopalakrishnan et al. 2018), and Eryngium viviparum (Ayuso et al. 2019). The difference is supposed to be related to the specific demand for macro- and micronutrients of each genotype. The results also revealed that full MS was more detrimental for inducing shoots, and this restriction can be overcome by the provision of KIN. It has been established that reduced MS concentration in the culture media promotes somatic embryogenesis in cotton (Kumria et al. 2003). The usage of LED light, either alone or in conjunction with R:B LED light at various ratios, has been shown to have a significant impact on in vitro germination and regeneration of different plants (Özcan et al. 2023). The study revealed the positive impact of R-LED on somatic embryogenesis in cotton. Whereas, the significance of the balanced use of LED (R:B = 1:1) for gaining the highest growth and morphogenesis in cotton has already been documented (Li et al. 2010). Within the scope of our study, shoot regeneration increased as the ratio of red LED light increased. The findings underscore the synergistic role of explant orientation, preconditioning treatments, and culture medium composition in overcoming recalcitrance in cotton for in vitro regeneration, paving a way for more effective cotton breeding and genetic improvement programs.

Pareto charts and normal plots are powerful statistical tools to estimate the significant impact in terms of ranking or placing the input variables in order. Whereas normal plots illustrate the relationship between the input and output variable in terms of direct proportional or inverse proportional impact with efficiency expressed in percentage (Katirci 2015). The use of both tools has been increasing in plant sciences and registered for in vitro regeneration (Aasim et al. 2024b), nanoparticle biosynthesis (Keijok et al. 2019), and indole acetic acid generation (Myo et al. 2019). Results of both cultivars exhibiting different responses to input variables were confirmed with contour and surface plots in this study. Both plots are highly significant and powerful tools for optimizing two input variables for a desired output target by splitting the data and expressing it with different colors (Kasman et al. 2019; Younis et al. 2023). The results of contour plots optimized the two input variables and their impact on generating the final shoot counts. It has been proven that both charts can be used for phytoremediation investigations (Jaskulak et al. 2020; Mohamad Thani et al. 2020) and in vitro propagation studies (Özcan et al. 2023).

The outcomes of the one-way ANOVA showed how input variables significantly affected the shoot number of two distinct cotton cultivars. Nevertheless, the application of traditional methods is unable to capture the apparent impact of input factors on the final output variables. Application of AI-based ML/ANN models for validating, predicting, and optimizing the data set through inferring the connection between the variables (input and output) has been documented (Balasubramani et al. 2020; Razzaghi et al. 2018). The data were exposed to several ML (RF, XGBoost) and ANN models (MLP) in order to predict and verify the accuracy. By considering the six measured performance criteria, all tested models predicted the results equally and precisely. These performance metrics are highly significant for evaluating the performance of the models. Among the performance metrics, R2 is the most widely used and well-established metric for data prediction. However, the use of multiple metrics is generally recommended and used for better modeling performance (Özcan et al. 2023; Wu et al. 2023). A high R2 value of 1 or nearly 1 combined with low values of other metrics indicates that the model is doing better (Arab et al. 2016). Comparably, RF model predicts outcomes better than alternative models for maximizing in vitro hemp callus formation and development (Hesami et al. 2021). On the other hand, in vitro regeneration of common bean has shown that MLP model performs better than RF and XGBoost models. Due to their widespread use in plant sciences and biotechnology, these models have all become quite prominent recently. The studies conducted for GSN-12 and STN-468 cultivars are very limited in the literature, and studies on in vitro germination and regeneration used for genetic transformations have been documented by various researchers (Bakhsh et al. 2016). Our results are highly significant and can be used for future biotechnological applications in cotton (Khan et al. 2023).

Conclusion

In this study, an effective in vitro regeneration strategy involving shoot induction, roots, and acclimation was established. The findings demonstrated the influence of individuals and combinations of various external factors on shoot counts of two distinct cotton cultivars grown in Türkiye. The response of GSN-12 was far better than STN-468 and yielded more shoots. Application of 10 mg·L−1 KIN (preconditioning) for 10 days was superior to other doses. Provision of 80.0% R-LED was beneficial for both cultivars. The impact of combinations of factors revealed that both cultivars require MS with 0.05 mg·L−1 KIN and 80.0% R-LED but different preconditioning KIN doses to generate more shoots. The use of AI/ML models accurately predicted and validated the outcomes. Results illustrate the possible use of other recalcitrant cotton cultivars and may be successfully used for the application of biotechnological tools to improve cotton genotypes. This study suggests that integrating advanced AI and ML techniques could improve cotton tissue culture protocols. Deep learning algorithms and complex predictive models could optimize regeneration strategies for recalcitrant cotton cultivars. These models could predict and enhance external factors, improving shoot induction and acclimatization processes.

Data availability

The data generated and materials used in this study are available from the corresponding author on reasonable request.

References

Download references

Acknowledgements

The authors are thankful to the Director, Cotton Research Institute Nazili, and May Seeds Company for providing seeds of cultivars used for this study.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

Özcan S and Aasim M conceptualized the study, Özkat GY and Bakhsh A executed the projects. Aasim M and Bakhsh A drafted the article and edited it. Ali SA and Aasim M analyzed the data. Özcan S supervised the overall study. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Aasim Muhammad or Bakhsh Allah.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests. Author Bakhsh A is a member of the Editorial Board of Journal of Cotton Research. Author Bakhsh A was not involved in the journal’s review of, or decision related to this manuscript.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Özkat, G.Y., Aasim, M., Bakhsh, A. et al. Machine learning models for optimization, validation, and prediction of light emitting diodes with kinetin based basal medium for in vitro regeneration of upland cotton (Gossypium hirsutum L.). J Cotton Res 8, 19 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s42397-025-00222-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s42397-025-00222-4

Keywords