Biomedicines 11 02633
Biomedicines 11 02633
Article
Machine Learning Algorithms Applied to Predict Autism
Spectrum Disorder Based on Gut Microbiome Composition
Juan M. Olaguez-Gonzalez 1 , Isaac Chairez 1,2 , Luz Breton-Deval 3,4 and Mariel Alfaro-Ponce 1,2, *
Abstract: The application of machine learning (ML) techniques stands as a reliable method for aiding
in the diagnosis of complex diseases. Recent studies have related the composition of the gut micro-
biota to the presence of autism spectrum disorder (ASD), but until now, the results have been mostly
contradictory. This work proposes using machine learning to study the gut microbiome composition
and its role in the early diagnosis of ASD. We applied support vector machines (SVMs), artificial neu-
ral networks (ANNs), and random forest (RF) algorithms to classify subjects as neurotypical (NT) or
having ASD, using published data on gut microbiome composition. Naive Bayes, k-nearest neighbors,
ensemble learning, logistic regression, linear regression, and decision trees were also trained and
validated; however, the ones presented showed the best performance and interpretability. All the ML
methods were developed using the SAS Viya software platform. The microbiome’s composition was
determined using 16S rRNA sequencing technology. The application of ML yielded a classification
Citation: Olaguez-Gonzalez, J.M.; accuracy as high as 90%, with a sensitivity of 96.97% and specificity reaching 85.29%. In the case of
Chairez, I.; Breton-Deval, L.;
the ANN model, no errors occurred when classifying NT subjects from the first dataset, indicating
Alfaro-Ponce, M. Machine Learning
a significant classification outcome compared to traditional tests and data-based approaches. This
Algorithms Applied to Predict
approach was repeated with two datasets, one from the USA and the other from China, resulting in
Autism Spectrum Disorder Based on
similar findings. The main predictors in the obtained models differ between the analyzed datasets.
Gut Microbiome Composition.
Biomedicines 2023, 11, 2633. https://
The most important predictors identified from the analyzed datasets are Bacteroides, Lachnospira,
doi.org/10.3390/biomedicines11102633 Anaerobutyricum, and Ruminococcus torques. Notably, among the predictors in each model, there is
the presence of bacteria that are usually considered insignificant in the microbiome’s composition
Academic Editors: Nelson Yee,
due to their low relative abundance. This outcome reinforces the conventional understanding of
Chunhua Su, Celestine Iwendi,
the microbiome’s influence on ASD development, where an imbalance in the composition of the
Thippa Reddy Gadekallu and
Keping Yu
microbiota can lead to disrupted host–microbiota homeostasis. Considering that several previous
studies focused on the most abundant genera and neglected smaller (and frequently not statistically
Received: 10 August 2023 significant) microbial communities, the impact of such communities has been poorly analyzed. The
Revised: 1 September 2023
ML-based models suggest that more research should focus on these less abundant microbes. A novel
Accepted: 18 September 2023
hypothesis explains the contradictory results in this field and advocates for more in-depth research
Published: 26 September 2023
to be conducted on variables that may not exhibit statistical significance. The obtained results seem
to contribute to an explanation of the contradictory findings regarding ASD and its relation with
gut microbiota composition. While some research correlates higher ratios of Bacillota/Bacteroidota,
Copyright: © 2023 by the authors. others find the opposite. These discrepancies are closely linked to the minority organisms in the
Licensee MDPI, Basel, Switzerland. microbiome’s composition, which may differ between populations but share similar metabolic func-
This article is an open access article tions. Therefore, the ratios of Bacillota/Bacteroidota regarding ASD may not be determinants in the
distributed under the terms and manifestation of ASD.
conditions of the Creative Commons
Attribution (CC BY) license (https:// Keywords: autism; ASD; machine learning; artificial neural networks; microbiome; microbiota
creativecommons.org/licenses/by/
4.0/).
1. Introduction
ML techniques are reliable approaches for aiding in the diagnosis of complex dis-
eases [1]. However, these techniques are primarily used to identify correlations rather than
establish causation between input–output information in systems with uncertain descrip-
tions [2]. Often, ML outcomes are misinterpreted as indicating causation, which can lead
to imprecise consequences when making decisions based solely on data observations [3].
While such correlations can provide fundamental insights into relationships explaining the
evolution of various medical disorders and illnesses, they must be interpreted cautiously.
Within the realm of potential applications of ML in medicine, particularly in addressing
ASD, researchers have explored the correlation between the composition of the gut mi-
crobiome and ASD. The outcomes of these studies have offered valuable insights into the
potential link between ASD and gut microbiome composition (GMC) [4]. However, note
that the existing literature presents contradictions among reported results, particularly
regarding the specific bacteria associated with the disease.
According to Namkung [5], a microbiome can be defined as the “composition of the
bacterial community in a specific environment”. Therefore, a gut microbiome refers to the
bacterial community residing within the human gut. Sabit et al. define the microbiome
as the “complete set of genomes of the microbial community which lives in symbiosis
with the host,” and distinguish “gut microbiota” as the bacteria living within the gut,
excluding viruses, protozoa, fungi, and archaea [6]. The composition of the gut microbiome
influences not only the function and development of the immune system but also plays a
significant role in various health conditions [7]. There are indications that intestinal bacteria
might not only be linked to ASD but also to other diseases associated with metabolic and
pro-inflammatory disorders [8]. Additionally, they may play a role in conditions such
as Parkinson’s disease, multiple sclerosis, rheumatoid arthritis, obesity, type 1 diabetes
(T1D), and the risk of heart disease [9–11]. In 2012, Yatsunenko et al. [12] concluded that
environmental factors, including geographical location, interact with cultural differences
such as diet and exposure to diverse microbes, influencing the composition of the gut
microbiome.
In 2015, Reddy et al. found, based on human and animal studies data, that altered
gut microbiota (dysbiosis) could be associated with ASD [13]. However, the exact bacterial
composition (including its temporal variations) that contributes to ASD is still unclear. As
stated by Rajkomar et al. [14], a data-driven strategy has become a fundamental technology
required for cases where the volume of process data exceeds human comprehension. This
is especially relevant for new approaches that not only identify statistical relationships
among variables but also learn “extremely complex relationships” from heterogeneous
medical data, including hospital records, medical notes, examination results, medical
images, sensor time series, genomic data, and laboratory results, among others. Given
that ML methods have demonstrated the potential to match human experts in diagnosing
certain diseases [15,16], we hypothesize that ML is likely to surpass our current capacity to
identify causal variables linking ASD and GMC. This work represents a step toward testing
this hypothesis.
Some researchers opted to use siblings or twins in their studies [25,40], while others
avoided their inclusion due to concerns about potential bias in the results [27]. Given that
finding females with ASD is more challenging than finding males [43], the majority of the
subjects involved in these studies were males. Note that, apart from Luna et al. [27] and
Williams et al. [44], most studies involved both males and females. However, this approach
may overlook innate differences in GMC based on sex [45]. Another influential factor is that
the gut microbiome composition of an individual changes over time [35]. This implies that
even though an individual’s ASD status remains constant, their bacterial concentrations
can vary.
Considering that there are diverse factors in each of those studies and that there is still
a lack of rules for conducting them, we are not likely to be able to generalize results and
make a solid conclusion about the actual relationship between gut microbiome composition
and ASD manifestation that may lead to the development of specific treatments.
Given the aforementioned facts and the advice from some authors to shift the focus
toward studying the byproducts of microbial fermentation and their metabolism in an
attempt to elucidate the causes of ASD related with GI symptoms, there has been a growing
number of publications adopting this new approach with outcomes similar to previous
studies. In 2019, Nogay et al. [46] stated, “With the available information, it is not yet
possible to develop a gut microbiota-based nutritional intervention to treat gastrointestinal
symptoms for individuals with autism.” However, the option of utilizing ML and other
computational techniques has not been fully explored yet. The intricacies of this problem
could potentially be addressed through the application of modern artificial intelligence
(AI) techniques, particularly ML. It is hoped that these techniques can effectively relate
key factors and provide insights, if not transform the perspective, for addressing such
challenges [47].
RF is an ML classifier that relies on the binary decision tree, which splits the samples
into two child nodes while trying to maximize the variance explained by the dependent
variable. RF allows the creation of an ordered by importance variable list according to the
scores of each variable in how influential they are in building the forest [54]. An SVM is an
ML classification technique whose primary purpose is to classify binary events. However,
it has been used in multigroup classification [55], and the classification process consists
of finding the optimum set of support vectors as separator hyperplanes to classify the
observations [56]. The multilayer perceptron is an architecture of the ANN where the
parameters that affect the network’s behavior are known as weights and biases. Each of
these ML tools may offer a suitable method to establish the relationship between the human
gut microbiome composition and ASD prevalence. According to this argument, each of
these ML tools were evaluated within the SAS Viya software 2021.2.4. Notice that, in the
three types of ML algorithms presented in this work, the classification performance was
achieved by supervised training and that the results were repeated in MATLAB when using
the interior point as an external function optimizer.
Figure 1 shows the multilayer perceptron architecture used in this study. The “Link”
is the weight of the connections between two neurons; the “Neuron Weight” represents the
bias of each neuron. Finally, the size of the neurons within the hidden layer is the relative
importance of the variables in the model developed. The ML algorithm can find the values
that optimize the classification capabilities of the network. If the performance of the ANN
is not satisfactory, the parameters must be changed until a good performance is achieved
without falling into overfitting.
Figure 1. General multilayer perceptron. The visual scale for links and bias goes from blue to yellow,
blue to negative values, and yellow to positive ones.
Biomedicines 2023, 11, 2633 6 of 24
( xi ) − min( x )
x0 = (1)
max( x ) − min( x )
For each attribute, x represents a vector of values for a given genus, min(x) denotes the
minimum value within the vector, max(x) signifies the maximum value within the vector,
xi represents the value to be normalized, and x 0 indicates the resulting normalized value.
No distinct feature selection technique was employed in this study. Instead, all
variables were input into an initial SVM classifier. The first SVM was utilized as a filter
to identify the 20 most relevant features, which were then employed to construct the
models discussed in this study. Subsequently, the models underwent fine-tuning by adding,
removing, or altering features, with the aim of determining the point at which the highest
overall accuracy of the model was achieved. This juncture was considered the optimal
feature quantity for each model.
The analysis depth was chosen up to the genus level because this is where “controver-
sial findings are more often reported” [23]. However, within Ding’s dataset, an additional
analysis was conducted, extending the depth to the species level. This extension was
performed to enable a comparison between our RF model and that presented by Ding
et al. [24] in their publication. In instances where certain species appeared as sequen-
tial order predictors, they were aggregated into their respective genera. This approach
aimed to provide a broader perspective on other important variables and facilitated a more
comprehensive comparative analysis.
The data used in this study were derived from existing research, which had already
performed statistical analyses using common metrics in the medical field. These analyses
Biomedicines 2023, 11, 2633 7 of 24
were not duplicated in our study; instead, they were utilized for comparison with the most
relevant variables identified by our ML models. We evaluated our model’s performance
using metrics based on the confusion matrix such as sensitivity, specificity, Youden’s index,
and overall prediction accuracy. All data preprocessing, normalization, and ML model
development procedures were conducted in SAS Viya, utilizing cloud-based GPUs. Our
study leverages the advanced ML techniques available in the SAS suite, facilitating a
comprehensive analysis and fair characterization of the proposed techniques. One of the
primary techniques for visualizing the outcomes of an ML model is the confusion matrix.
This matrix provides a concise representation of the predicted and actual classification out-
comes for various data segments, offering a rapid assessment of the model’s performance,
as well as insight into true or false positive and negative ratios [57].
There were 210 weight parameters (WP) and 12 bias parameters (BP), although their
actual positioning and values were not generated in the report offered by the SAS Viya
software 2021.2.4 used for the modeling. For the second analysis, the ANN was composed
of 20 input neurons, 18 hidden neurons in a single layer, and two output neurons, 318 WP
and 20 BP, with a multilayer perceptron architecture for the second dataset. The actual
connections and values for WP and BP were not generated in the report offered by the SAS
Viya software 2021.2.4 used for the modeling.
Sp is based on the true negative rate prediction; in our case, the number of actual NT cases
within the test partition is correctly classified as NT,
TN
Sp = , (2)
TN + FP
where TN stands for true negative and FP for false positive. This metric evaluates the
model’s effectiveness in recognizing cases that do not present the disease.
Sn is based on the true positive rate prediction; in our case, the number of actual ASD
cases within the test partition is correctly classified as ASD,
TP
Sn = , (3)
TP + FN
where TP stands for true positive and FN for false negative. This metric measures the
capability to identify the cases that present a condition and is penalized when a false
positive is predicted.
Acc is used to quantify the actual performance in predicting a given class as
TP + TN
Acc = . (4)
TN + TP + FP + FN
Youden’s The Youden index (or J) denotes the classification threshold for which J is maxi-
mal and is defined as [60]
Youden’s = Sn + Sp − 1 (5)
3. Results
This section presents the outcomes of applying ML modeling to the selected datasets.
The outcomes of the primary feature selection process utilizing an SVM for each ML model
are depicted in Figure 2 for Zou’s ML models and in Figure 3 for Ding’s ML models.
The feature set remains consistent for training the SVM model; however, for the
ANN and RF, it was necessary to fine-tune the feature selection. Details of the retained
predictors and their relative contributions to the RF model’s performance are illustrated in
Figures 4 and 5.
Biomedicines 2023, 11, 2633 9 of 24
Figure 2. Relative importance for Zou’s SVM model. The first-split log worth is used to rank the
variables when applied to the scored training data.
Figure 3. Relative importance for Ding’s SVM model DING. The first-split log worth is used to rank
the variables when applied to the scored training data.
Biomedicines 2023, 11, 2633 10 of 24
Figure 4. Relative importance for Zou’s RF model. While the RF model does have common variables
with the SVM model, the significance attributed to Lachnospira stands out significantly compared to
the other variables.
Figure 5. Relative importance for Ding’s RF dataset. Unlike the RF model from Zou’s dataset, in this
case, there is not a single dominant predictor; rather, the model’s classification capabilities are mostly
attributed to the top five main predictors.
Biomedicines 2023, 11, 2633 11 of 24
The set of predictors for the ANN model was the same as that for the RF model
across both datasets, with the purpose of researching the capabilities and differences
in classification for both algorithms. To ascertain the ideal number of features for each
algorithm, we identified the point at which the addition of further features to the model
yielded marginal improvements in accuracy, as assessed by the Acc metric.
Given the high number of bacteria genera found within the GMC, we considered
18 as a reasonable number of features that can explain most of the classification process
and interpret the results; therefore, this number of predictors was used, although RF for
Zou’s model can mostly be explained by Lachnospira and Escherichia–Shigella. The rest of the
models for both datasets showed a progressive decrease in performance with the addition
of more features. The ANN and SVM models performed with comparable metrics. All
the of models were tested on separate data to evaluate the classification performance with
unseen cases. Although the ideal would be to test the models with different datasets, it was
not possible with this work and will be considered in future research.
Figure 6. Confusion matrix for Zou’s SVM model. The results of evaluations carried out on the
models using data not previously utilized for training or validation are displayed in the confusion
matrix labeled under the ”test” category.
Biomedicines 2023, 11, 2633 12 of 24
Figure 7. Confusion matrix for Zou’s ANN Model. The ANN model exhibits an intermediate level of
accuracy between the RF and SVM models in the training and validation partitions. Nevertheless, it
achieves a 90% accuracy in the test partition, surpassing the RF model and matching the performance
of the SVM model.
In contrast, the random forest (RF) model displayed a noticeably lower performance,
attaining an accuracy of 80% in the test partition as shown in Figure 8. The RF model’s
accuracy ranged from 78% to 83%, and the reported result represents an average of five
models, each utilizing five-fold cross-validation. This cross-validation approach involved
adjusting the test partitions while maintaining a consistent maximum of trees set to 100.
Figure 8. Confusion matrix for Zou’s RF Model. Despite the RF model’s superior performance in
the training set compared to the SVM and ANN models, its performance in the test partition was
comparatively lower, as it was the only model that misclassified an NT subject.
The generated models using each ML algorithm were compared based on their final
accuracy in predicting the target variable against actual observations. The overall aver-
age precision of the three models for Zou’s dataset, focusing solely on the test partition,
amounted to 86.66%. Notably, the metrics for Youden’s statistics, sensitivity, and specificity
for both the SVM and ANN models are presented in Table 1. It is important to remember
that, in clinical settings, false negatives are generally considered more critical than false
positives, primarily because subjects may not receive the necessary treatment.
Biomedicines 2023, 11, 2633 13 of 24
Table 1. SVM and ANN performance indicators in the test partition for the models for Zou’s dataset.
Both the SVM and RF algorithms provided insights into feature importance. The top
five main features are outlined in Table 2, while the comprehensive feature importance
for the SVM model is presented in descending order in Figure 2, and for the RF model
in Figure 4. This characteristic proves valuable when constructing models that empha-
size the significance of these variables, effectively isolating them from the less impactful
bacterial factors.
Table 2. Main predictors for each of the ML classifier models in descending importance for the
classification process for Zou’s dataset.
SVM ANN RF
Bacteroides Lachnospira Lachnospira
Lachnospira Bacteroides Escherichia–Shigella
Blautia Lachnoclostridium Bacteroides
Lachnoclostridium Blautia Blautia
Subdoligranulum Subdoligranulum Roseburia
The SVM model classified 9/10 cases accurately within the test partition. Remember
that the test partition comprises data not used for training nor validation. In other words,
the developed model tries to classify new events, simulating new clinical cases. The
confusion matrix is shown in Figure 6, and in terms of percentages, for the training partition,
the accuracy was 91.04%, for the validation 78.94%, and 90% for the test partition. The only
error in the prediction was an ASD case classified as NT.
The ANN performed better than the SVM in the training and validation partitions,
with 97.01% for training and 82.21% for validation, with the same performance as the SVM
in the test partition. The high accuracy in the test partition discards overfitting, as the high
accuracy in the training could suggest. Figure 9 shows the architecture for this ANN model.
As in the SVM model, the only misclassification was an ASD case predicted as NT.
Most relevant for these two models is that no NT case was misclassified as ASD. This is a
remarkable outcome produced with the implementation of an ANN as a potential approxi-
mation of the relationships between the human gut composition and ASD symptoms.
Figure 10. Confusion matrix SVM model Ding. The classification performance achieved in the test
partition is better than that in the training and validation partitions, suggesting that overfitting can
be discarded.
Biomedicines 2023, 11, 2633 15 of 24
Figure 11. Architecture for ANN model Ding. The weight scale allows us to detect the approximate
value of each connection among the neurons, although the actual value is not reported by the
software used.
Table 3. Performance metrics for the models developed with Ding’s dataset.
The five most significant variables used for the prediction power of this model were
Lachnospira, Bacteroides, Lachnoclostridium, Blautia, and Subdoligranulum. The RF model
reached a constant 92.3% in the test partition using five models with five-fold cross-
validation.
The ANN model presents a tendency for the misclassification of ASD cases and that
tendency is carried on from the validation partition to the test partition. A possible explana-
tion is that the ASD and NT compositions for those misclassified cases were probably not
different enough to be detectable even for ML algorithms, suggesting that those cases could
not be associated with GMC and may fall within the approximately 30% of genetically
explainable cases. This hypothesis cannot be confirmed purely using ML models and must
be confirmed with clinical assessment.
The variable importance chart for the RF model is shown in Figure 5, and its confusion
matrix presented for the training, validation, and test partitions is in Figure 12. The average
performance for the three models remains at 92.3%, with only one case misclassified for
13 cases not shown to the models before. The test partitions are different among the three
models. The only error in the classification in the three models was an NT case classified as
ASD.
Biomedicines 2023, 11, 2633 16 of 24
Figure 12. Confusion matrix of the RF model for Ding’s dataset. Different from the ANN model, one
misclassification in the partition test cannot be considered as a tendency of the model because such a
tendency is not shown in either the validation or the training partitions.
Table 4 lists a comparison of the five main predictors after fine-tuning the ANN and
RF models, with Ruminococcus torques and Anaerobutyricum being present in the three
classifiers. The results show that the SVM and ANN had a similar performance, while RF
had a slightly lower accuracy. However, the ANN tends to misclassify some ASD cases
as NT, and although this may have a clinical explanation, with the developed tools, such
an explanation can only be hypothesized. The RF model’s performance for Zou’s dataset
reached an acceptable 75% performance with fewer features, considering that most of the
models are related to Lachnospira. However, a choice was made to keep all the predictors
as future simulations may help to understand the actual contribution of the metabolites
produced by those bacteria and their interaction within their environment to modulate the
brain function. The main benefit of training and analyzing an ML model based on the GMC
of the subjects lies in the elimination of the need to formulate any prior hypotheses about
which bacteria are linked to ASD; the results give insights for further research and our
efforts to diagnose ASD through ML and the GMC approach complement widely accepted
tests. This approach has the potential to expedite the diagnostic process.
Table 4. Main predictors for each of the ML classifiers in descending importance for Ding’s dataset.
SVM ANN RF
Ruminococcus torques Anaerobutyricum Anaerobutyricum
Anaerobutyricum Bacteroides Faecalibacterium
Dorea Ruminococcus torques Clostridium sensu stricto
Subdoligranulum Dorea Ruminococcus torques
Bacteroides Subdoligranulum Agathobacter
Biomedicines 2023, 11, 2633 17 of 24
4. General Discussion
This study’s primary focus was on developing ML models for classifying ASD sub-
jects based on their gut microbiota composition using publicly available datasets. The
first dataset originated from the USA, while the second dataset was sourced from China.
Two distinct ML models (ANN- and SVM-based) were developed to differentiate between
individuals diagnosed with ASD and those without. Notice that these models are not
interchangeable nor generalized; they are specific to the GMCs used for training. Addi-
tionally, the models are suitable for subjects who closely resemble the sample subjects in
terms of age and BMI, and who have not taken antibiotics in the past month [23,24]. The
classification capabilities of the proposed models (once confirmed through clinical trials)
can streamline the diagnostic process for ASD while maintaining the established standard
of care.
The model’s development steps involving data splitting, hyperparameter tuning,
model training, and testing were iterated five times to ensure a robust performance and
mitigate the impact of variations in splits.
A two-by-two confusion matrix was generated for each dataset fragmentation, pro-
viding the counts for true positives, false positives, false negatives, and true negatives [60].
For ASD detection, key metrics such as accuracy, sensitivity, and specificity were calculated
based on the values in the confusion matrix and presented in Figures 10, 12 and 13 for the
SVM, RF and ANN models respectively. According to the confusion matrix analysis, the
varying prevalence of ASD could potentially influence the predictive capabilities of the
models, potentially limiting their generalizability.
Figure 13. Confusion matrix of the ANN model for Ding’s dataset. Despite achieving an overall
accuracy of 92.3%, the model exhibits a slight tendency to classify certain ASD subjects as NT, which
is evident in both the validation and test partitions.
Previous research has suggested that ML algorithms can effectively reveal intricate
relationships between microbiota and neurodevelopmental disorders [61]. Hence, a com-
plementary aim was to identify the bacterial genera predictors that best contribute to
early-stage ASD diagnosis and to provide interpretability to the results by understanding
how these main predictors might influence brain functionality.
The developed models offer advantages in identifying potential bacteria that can
impact the homeostasis of the gut microbiota, often overlooked due to the common practice
of disregarding variables that do not reach a pre-established p-value, typically set at 0.05 or
lower. Many of the crucial predictors used in our ML models do not exhibit statistically
significant differences between ASD and NT subjects. The variations in relative abundances
among the majority of the 20 predictors for ASD and NT subjects in both models did
not attain the aforementioned statistical significance. This outcome further emphasizes
our assertion that attempting to discern the health status of a subject solely based on
individual taxa, or even exclusively on those taxa showing statistical differences, is a highly
challenging task and may even be unfeasible [62]. The actual health status might arise from
intricate interactions among the metabolites produced by a diverse range of bacteria.
Biomedicines 2023, 11, 2633 18 of 24
Evidence highlights that early ASD treatment can mitigate symptoms, underscoring
the importance of accurate and prompt diagnosis for this neurodevelopmental disorder [61].
However, the standalone application of the developed models is not yet practical for clinical
use. Further validation is required through clinical practice and testing in other cohorts
that share similar characteristics with the training samples used in this study. Real-world
decisions must be overseen by medical doctors, making the ML models most valuable for
aiding clinicians with diagnosis and treatment decisions [63].
The primary predictor in the RF- and ANN-based models for the first dataset is the
genus Lachnospira, ranking second in the support vector machine (SVM) model. Following
the statistical analysis, some studies employed an RF classifier for data processing [24,64].
However, in Zou’s and Ding’s datasets, the RF algorithm demonstrated the least favorable
performance among the ML models.
The family Lachnospiraceae does not exhibit statistical differences between ASD and
NT subjects in the analysis presented by Zou et al. [23], it is notable that three of the top
five predictors in the three models developed in this study for the Zou dataset belong to the
family Lachnospiraceae. When examining the information published by Zou et al. [23] at the
genus level, they report that the statistically distinct genus abundances between ASD and
NT children were enriched in ASD subjects: Bacteroides, Prevotella, Lachnospiraceae incertae
sedis, and Megamonas; meanwhile, these genera were diminished in ASD subjects (enriched
in NT): Clostridium clusters IV and XIVa, Eisenbergiella, Flavonifractor, Escherichia–Shigella,
Haemophilus, Akkermansia, and Dialister.
Within the five main predictors for the SVM and ANN models, only Bacteroides
(p = 2.4 × 10−3 ) is among those reported as statistically different between ASD and NT
subjects.
Among the remaining predictors, Escherichia–Shigella (p = 2.39 × 10−2 ), Akkermansia
(p = 2.51 × 10−2 ), and Dialister (p = 3.67 × 10−2 ) are statistically different. This brings the
total count of statistically different variables to 4 out of the 18 predictors.
The classification accuracy of 90% suggests that the ML algorithms were capable of
detecting a complex relationship between those 18 variables and the label ASD in the
dataset provided for training. Considering that only 4 of those variables used as predictors
in our study presented statistical difference in the analysis made by Zou et al. [23], the rest
of the predictors could be considered as relevant and have such complex interactions that
they are very hard to find using the classical approach only.
Among the five main important variables in RF was Escherichia–Shigella, although
Lachnoclostridium was not present. An important fact to note is that the ANN/RF and the
SVM share the top five predictors, although the order given by each algorithm varies for
both classifiers. The RF model had a performance that was 10% lower.
The statistical analysis identifies the two main predictors; however, it may not account
for other significant variables that could also influence brain function:
• The Blautia genus, through its metabolites, is capable of mitigating inflammatory and
metabolic conditions, as well as having an ability to combat certain microorganisms
through antibacterial actions [65] and its strong inflammatory conditions are linked
to ASD [66]. The Eubacterium eligens group produce Interleukin 10 (IL-10), an anti-
inflammatory cytokine that delivers its activity in the epithelial cells [67].
• Akkermansia is considered a novel probiotic candidate that directly influences the gut–
brain axis by modulating the permeability of the gut [68]. Akkermansia is associated
with Subdoligranulum and it has been found that when an Akkermansia probiotic is
consumed, there is also an increase in Subdoligranulum [69].
• Lachnoclostridium is a producer of trimethylamine [70], a metabolite that has previously
been associated with neurodevelopmental disorders and specifically with the presence
of ASD [71,72].
• Some species of Feacalibacterium, such as Faecalibacterium prausnitzii, produce short-
chain fatty acids (SCFAs) as subproducts of their metabolism. SCFAs contribute to the
strength of the intestinal epithelial layer and can reach the brain [73].
Biomedicines 2023, 11, 2633 19 of 24
Table 5. Variables that show statistical difference according to Ding et al. [24].
The main predictors obtained in the RF model at the species level generated by Ding
et al. are presented in Table 6. However, even with these results, in their study, there is no
explanation to justify the main predictors being from a different genus than those found
with a bigger statistical difference among the ASD and NT subjects, and no further analysis
was presented.
In our RF model corresponding to this dataset, the main five predictors are: Anaerobu-
tyricum (ninth in Ding’s model), Faecalibacterium, Clostridium sensu stricto 1 Ruminococcus
torques group (which includes (Ruminococcus) torques ATCC 27756 , (Ruminococcus) torques
IX–70, (Ruminococcus) torques L2–14, and (Ruminococcus) torques VIII–23), and finally Agath-
obacter. Two important points to notice for comparison between our RF model and Ding’s
are as follows:
• There are also species as predictors. As the model trained in the original article used
species, we did it this way because it was how the RF model performed the best
with fewer predictors. The percentage of the species joined in their genus was low,
reinforcing our hypothesis that small changes in low-populated bacteria may have a
bigger effect than relatively bigger changes in more populated bacteria.
• The second point is that the only predictor in common with the RF developed by Ding
et al. is Anaerobutyricum, which is our main predictor (in the RF model), but in their
model is ninth. For the other models, the main five predictors are shown in Table 4.
Table 6. Predictors from the species level RF model reported by Ding et al. [24].
From the SVM model, which had the best accuracy prediction, the genera Anaerobu-
tyricum, Anaeorstipes, Agathobacter, and Dorea are capable of producing SCFAs [77], and in
particular, Anaerostipes and Agathobacter are mostly butyrate productors [78,79], possibly
influencing the presence of ASD by the previously described effects of SCFAs.
As shown for the two datasets, most of the predictors for the trained models are
involved in the production of metabolites that are related to inflammation, the permeability
of the gut, and SCFAs production.
The ratio of Bacillota/Bacteroidota between ASD and NT subjects was different in the
two studies. While Zou et al. found a ratio of Bacteroidota/Bacillota of 0.74 in ASD subjects
and 0.31 in the NT group, Ding et al. found no difference (p = 0.130). Choosing these two
datasets, with their different geographic sampling and results, allowed a better testing
scenario for the utility of the application of the ML algorithms to approach this topic.
5. Conclusions
The presented approach prioritizes the early identification of ASD in children, aiming
to enhance their quality of life by enabling swift diagnosis and timely treatment. The
effectiveness of the trained models in classifying ASD cases is evaluated through perfor-
mance metrics. It is evident that the utilization of ML models to detect ASD based on gut
microbiota composition yields promising results. Moreover, the performance of these ML
models can be enhanced by expanding the dataset to include a larger number of training
samples. The results obtained from the assessed classifiers and the relative importance
of features suggest that significant changes in the concentration of bacteria with higher
prevalence within the gut might be better tolerated without triggering ASD symptoms.
Among the various ML tools evaluated, the ANN and SVM emerged as the most effective
approaches for establishing a connection between human GMC and the presence of ASD
in the subjects under scrutiny. Furthermore, marginal fluctuations in the proportions of
less abundant bacteria could potentially contribute to the manifestation of ASD symp-
toms. Many of the predictors employed for model training deviate from the statistically
significant features typically identified. Nonetheless, the metabolites generated by these
Biomedicines 2023, 11, 2633 21 of 24
bacteria, even if not deemed statistically distinct, play a role in inflammatory processes
and the production of SCFAs, both of which have been associated with ASD. Addressing
this intricate issue may necessitate an interdisciplinary team approach, involving experts
in artificial intelligence, particularly ML, to leverage the insights provided by the models.
This collaborative effort can help reconcile the conflicting outcomes reported across various
research studies concerning the relationship between ASD and GMC.
Abbreviations
The following abbreviations are used in this manuscript:
ML Machine learning
ASD Autism spectrum disorder
SVM Support vector machines
ANN Artificial neural networks
RF Random forest
NT Neurotypical
GMC Gut microbiome composition
T1D Type 1 diabetes
ASV Amplicon sequence variants
WP Weight parameters
BP Bias parameters
SCFAs Short-chain fatty acids
IL-10 Interleukin 10
BMI Body mass index
LDA Linear discriminant analysis
References
1. Fatima, M.; Pasha, M. Survey of Machine Learning Algorithms for Disease Diagnostic. J. Intell. Learn. Syst. Appl. 2017, 9, 16.
[CrossRef]
2. Richens, J.G.; Lee, C.M.; Johri, S. Improving the accuracy of medical diagnosis with causal machine learning. Nat. Commun. 2020,
11, 3923. [CrossRef] [PubMed]
3. Stock, P.; Cissé, M. ConvNets and ImageNet Beyond Accuracy: Explanations, Bias Detection, Adversarial Examples and Model
Criticism. arXiv 2017, arXiv:1711.11443. https://doi.org/10.48550/arXiv.1711.11443.
Biomedicines 2023, 11, 2633 22 of 24
4. Fu, S.C.; Lee, C.H.; Wang, H. Exploring the Association of Autism Spectrum Disorders and Constipation through Analysis of the
Gut Microbiome. Int. J. Environ. Res. Public Health 2021, 18, 667. [CrossRef] [PubMed]
5. Namkung, J. Machine learning methods for microbiome studies. J. Microbiol. 2020, 58, 206–216. [CrossRef]
6. Sabit, H.; Tombuloglu, H.; Rehman, S.; Almandil, N.B.; Cevik, E.; Abdel-Ghany, S.; Rashwan, S.; Abasiyanik, M.F.; Yee Waye,
M.M. Gut microbiota metabolites in autistic children: An epigenetic perspective. Heliyon 2021, 7, e06105. [CrossRef]
7. Chervonsky, A. Innate receptors and microbes in induction of autoimmunity. Curr. Opin. Immunol. 2009, 21, 641–647. [CrossRef]
8. Brenchley, J.M.; Douek, D.C. Microbial translocation across the GI tract. Annu. Rev. Immunol. 2012, 30, 149–173. [CrossRef]
9. Finegold, S.M. State of the art; microbiology in health and disease. Intestinal bacterial flora in autism. Anaerobe 2011, 17, 367–368.
[CrossRef]
10. Mulle, J.G.; Sharp, W.G.; Cubells, J.F. The gut microbiome: A new frontier in autism research. Curr. Psychiatry Rep. 2013, 15, 337.
[CrossRef]
11. Giongo, A.; Gano, K.A.; Crabb, D.B.; Mukherjee, N.; Novelo, L.L.; Casella, G.; Drew, J.C.; Ilonen, J.; Knip, M.; Hyöty, H.; et al.
Toward defining the autoimmune microbiome for type 1 diabetes. ISME J. 2011, 5, 82–91. [CrossRef] [PubMed]
12. Yatsunenko, T.; Rey, F.E.; Manary, M.J.; Trehan, I.; Dominguez-Bello, M.G.; Contreras, M.; Magris, M.; Hidalgo, G.; Baldassano,
R.N.; Anokhin, A.P.; et al. Human gut microbiome viewed across age and geography. Nature 2012, 486, 222–227. [CrossRef]
[PubMed]
13. Reddy, B.L.; Saier, M.H. Autism and Our Intestinal Microbiota. J. Mol. Microbiol. Biotechnol. 2015, 25, 51–55. [CrossRef]
14. Rajkomar, A.; Dean, J.; Kohane, I. Machine Learning in Medicine. N. Engl. J. Med. 2019, 380, 1347–1358. [CrossRef]
15. Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level classification of skin cancer
with deep neural networks. Nature 2017, 542, 115–118. [CrossRef] [PubMed]
16. De Fauw, J.; Ledsam, J.R.; Romera-Paredes, B.; Nikolov, S.; Tomasev, N.; Blackwell, S.; Askham, H.; Glorot, X.; O’Donoghue, B.;
Visentin, D.; et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 2018, 24, 1342–1350.
[CrossRef] [PubMed]
17. American, P.A. Diagnostic and Statistical Manual of Mental Disorders DSM-5, 5th ed.; American Psychiatric Association: Washington,
DC, USA, 2013. [CrossRef]
18. Kim, S.H.; Lord, C. Autism Diagnostic Interview, Revised. In Encyclopedia of Clinical Neuropsychology; Kreutzer, J.S., DeLuca, J.,
Caplan, B., Eds.; Springer: New York, NY, USA, 2011; pp. 313–315. [CrossRef]
19. Luyster, R.; Gotham, K.; Guthrie, W.; Coffing, M.; Petrak, R.; Pierce, K.; Bishop, S.; Esler, A.; Hus, V.; Oti, R.; et al. The Autism
Diagnostic Observation Schedule–Toddler Module: A New Module of a Standardized Diagnostic Measure for Autism Spectrum
Disorders. J. Autism Dev. Disord. 2009, 39, 1305–1320. [CrossRef]
20. Schopler, E.; Van Bourgondien, M.E.; Wellman, G.J.; Love, S.R. Childhood Autism Rating Scale, 2nd ed.; Western Psychological
Services: Los Angeles, CA, USA, 2010.
21. Heffler, K.F.; Oestreicher, L.M. Causation model of autism: Audiovisual brain specialization in infancy competes with social
brain networks. Med. Hypotheses 2016, 91, 114–122. [CrossRef]
22. Amaral, D.G. Examining the Causes of Autism. In Cerebrum: The Dana Forum on Brain Science; Dana Foundation: New York, NY,
USA, 2017; Volume 2017.
23. Zou, R.; Xu, F.; Wang, Y.; Duan, M.; Guo, M.; Zhang, Q.; Zhao, H.; Zheng, H. Changes in the Gut Microbiota of Children with
Autism Spectrum Disorder. Autism Res. 2020, 13, 1614–1625. [CrossRef]
24. Ding, X.; Xu, Y.; Zhang, X.; Zhang, L.; Duan, G.; Song, C.; Li, Z.; Yang, Y.; Wang, Y.; Wang, X.; et al. Gut microbiota changes in
patients with autism spectrum disorders. J. Psychiatr. Res. 2020, 129, 149–159. [CrossRef]
25. Son, J.S.; Zheng, L.J.; Rowehl, L.M.; Tian, X.; Zhang, Y.; Zhu, W.; Litcher-Kelly, L.; Gadow, K.D.; Gathungu, G.; Robertson, C.E.;
et al. Comparison of Fecal Microbiota in Children with Autism Spectrum Disorders and Neurotypical Siblings in the Simons
Simplex Collection. PLoS ONE 2015, 10, e0137725. [CrossRef]
26. Kang, D.W.; Ilhan, Z.E.; Isern, N.G.; Hoyt, D.W.; Howsmon, D.P.; Shaffer, M.; Lozupone, C.A.; Hahn, J.; Adams, J.B.; Krajmalnik-
Brown, R. Differences in fecal microbial metabolites and microbiota of children with autism spectrum disorders. Anaerobe 2018,
49, 121–131. [CrossRef] [PubMed]
27. Luna, R.A.; Oezguen, N.; Balderas, M.; Venkatachalam, A.; Runge, J.K.; Versalovic, J.; Veenstra-VanderWeele, J.; Anderson, G.M.;
Savidge, T.; Williams, K.C. Distinct Microbiome-Neuroimmune Signatures Correlate With Functional Abdominal Pain in Children
with Autism Spectrum Disorder. Cell. Mol. Gastroenterol. Hepatol. 2017, 3, 218–230. [CrossRef] [PubMed]
28. Strati, F.; Cavalieri, D.; Albanese, D.; De Felice, C.; Donati, C.; Hayek, J.; Jousson, O.; Leoncini, S.; Renzi, D.; Calabrò, A.; et al.
New evidences on the altered gut microbiota in autism spectrum disorders. Microbiome 2017, 5, 24. [CrossRef] [PubMed]
29. Kang, D.W.; Adams, J.B.; Gregory, A.C.; Borody, T.; Chittick, L.; Fasano, A.; Khoruts, A.; Geis, E.; Maldonado, J.; McDonough-
Means, S.; et al. Microbiota Transfer Therapy alters gut ecosystem and improves gastrointestinal and autism symptoms: An
open-label study. Microbiome 2017, 5, 10. [CrossRef] [PubMed]
30. Inoue, R.; Sakaue, Y.; Sawai, C.; Sawai, T.; Ozeki, M.; Romero-Pérez, G.A.; Tsukahara, T. A preliminary investigation on the
relationship between gut microbiota and gene expressions in peripheral mononuclear cells of infants with autism spectrum
disorders. Biosci. Biotechnol. Biochem. 2016, 80, 2450–2458. [CrossRef]
31. Hughes, H.K.; Rose, D.; Ashwood, P. The Gut Microbiota and Dysbiosis in Autism Spectrum Disorders. Curr. Neurol. Neurosci.
Rep. 2018, 18, 81. [CrossRef]
Biomedicines 2023, 11, 2633 23 of 24
32. Parracho, H.M.; Bingham, M.O.; Gibson, G.R.; McCartney, A.L. Differences between the gut microflora of children with autistic
spectrum disorders and that of healthy children. J. Med. Microbiol. 2005, 54, 987–991. [CrossRef]
33. Sharon, G.; Cruz, N.J.; Kang, D.W.; Gandal, M.J.; Wang, B.; Kim, Y.M.; Zink, E.M.; Casey, C.P.; Taylor, B.C.; Lane, C.J.; et al.
Human Gut Microbiota from Autism Spectrum Disorder Promote Behavioral Symptoms in Mice. Cell 2019, 177, 1600–1618.e17.
[CrossRef]
34. Kang, D.W.; Adams, J.B.; Coleman, D.M.; Pollard, E.L.; Maldonado, J.; McDonough-Means, S.; Caporaso, J.G.; Krajmalnik-Brown,
R. Long-term benefit of Microbiota Transfer Therapy on autism symptoms and gut microbiota. Sci. Rep. 2019, 9, 5821. [CrossRef]
35. Jennifer, F.; Nancy, M.H.; Jody, D.; Cody, G.; Dae-Wook, K.; Juan, M.; Jones Rachel, A.; Kimberly, J.; Adams James, B.; Rosa,
K.B.; et al. The Gut Microbiome in Autism: Study-Site Effects and Longitudinal Analysis of Behavior Change. mSystems 2021,
6, e00848-20. [CrossRef]
36. Williams, B.L.; Hornig, M.; Buie, T.; Bauman, M.L.; Cho Paik, M.; Wick, I.; Bennett, A.; Jabado, O.; Hirschberg, D.L.; Lipkin,
W.I. Impaired carbohydrate digestion and transport and mucosal dysbiosis in the intestines of children with autism and
gastrointestinal disturbances. PLoS ONE 2011, 6, e24585. [CrossRef] [PubMed]
37. Tomova, A.; Husarova, V.; Lakatosova, S.; Bakos, J.; Vlkova, B.; Babinska, K.; Ostatnikova, D. Gastrointestinal microbiota in
children with autism in Slovakia. Physiol. Behav. 2015, 138, 179–187. [CrossRef] [PubMed]
38. De Angelis, M.; Francavilla, R.; Piccolo, M.; De Giacomo, A.; Gobbetti, M. Autism spectrum disorders and intestinal microbiota.
Gut Microbes 2015, 6, 207–213. [CrossRef]
39. Finegold, S.M.; Dowd, S.E.; Gontcharova, V.; Liu, C.; Henley, K.E.; Wolcott, R.D.; Youn, E.; Summanen, P.H.; Granpeesheh, D.;
Dixon, D.; et al. Pyrosequencing study of fecal microflora of autistic and control children. Anaerobe 2010, 16, 444–453. [CrossRef]
40. Gondalia, S.V.; Palombo, E.A.; Knowles, S.R.; Cox, S.B.; Meyer, D.; Austin, D.W. Molecular characterisation of gastrointestinal
microbiota of children with autism (with and without gastrointestinal dysfunction) and their neurotypical siblings. Autism Res.
Off. J. Int. Soc. Autism Res. 2012, 5, 419–427. [CrossRef]
41. Wang, L.; Christophersen, C.T.; Sorich, M.J.; Gerber, J.P.; Angley, M.T.; Conlon, M.A. Increased abundance of Sutterella spp. and
Ruminococcus torques in feces of children with autism spectrum disorder. Mol. Autism 2013, 4, 42. [CrossRef]
42. Wang, L.; Christophersen, C.T.; Sorich, M.J.; Gerber, J.P.; Angley, M.T.; Conlon, M.A. Low relative abundances of the mucolytic
bacterium Akkermansia muciniphila and Bifidobacterium spp. in feces of children with autism. Appl. Environ. Microbiol. 2011,
77, 6718–6721. [CrossRef]
43. Loomes, R.; Hull, L.; Mandy, W.P.L. What Is the Male-to-Female Ratio in Autism Spectrum Disorder? A Systematic Review and
Meta-Analysis. J. Am. Acad. Child Adolesc. Psychiatry 2017, 56, 466–474. [CrossRef]
44. Williams, B.L.; Hornig, M.; Parekh, T.; Lipkin, W.I. Application of novel PCR-based methods for detection, quantitation, and
phylogenetic characterization of Sutterella species in intestinal biopsy samples from children with autism and gastrointestinal
disturbances. mBio 2012, 3, e00261-11. [CrossRef]
45. Kim, Y.S.; Unno, T.; Kim, B.Y.; Park, M.S. Sex Differences in Gut Microbiota. World J. Men’s Health 2020, 38, 48–60. [CrossRef]
46. Nogay, N.H.; Nahikian-Nelms, M. Can we reduce autism-related gastrointestinal and behavior problems by gut microbiota based
dietary modulation? A review. Nutr. Neurosci. 2021, 24, 327–338. [CrossRef] [PubMed]
47. Shaban-Nejad, A.; Kamaleswaran, R.; Shin, E.K.; Akbilgic, O. Chapter Six - Health intelligence. In Biomedical Information Technology,
2nd ed.; Feng, D.D., Ed.; Biomedical Engineering; Academic Press: Cambridge, MA, USA, 2020; pp. 197–215. [CrossRef]
48. Wayne, W.; Daniel, C.L.C. Biostatistics: A Foundation for Analysis in the Health Sciences, 11th ed.; Wiley: Hoboken, NJ, USA, 2018.
49. Romano, R.; Gambale, E. Statistics and medicine: The indispensable know-how of the researcher. Transl. Med. UniSa 2013,
5, 28–31. [PubMed]
50. Banerjee, A.; Jadhav, S.L.; Bhawalkar, J.S. Probability, clinical decision making and hypothesis testing. Ind. Psychiatry J. 2009, 18,
64. [CrossRef] [PubMed]
51. Balint, L.; Socaciu, C.; Socaciu, A.I.; Vlad, A.; Gadalean, F.; Bob, F.; Milas, O.; Cretu, O.M.; Suteanu-Simulescu, A.; Glavan, M.; et al.
Quantitative, Targeted Analysis of Gut Microbiota Derived Metabolites Provides Novel Biomarkers of Early Diabetic Kidney
Disease in Type 2 Diabetes Mellitus Patients. Biomolecules 2023, 13, 1086. [CrossRef] [PubMed]
52. Azuma, K.; Uchiyama, I.; Tanigawa, M.; Bamba, I.; Azuma, M.; Takano, H.; Yoshikawa, T.; Sakabe, K. Chemical intolerance:
involvement of brain function and networks after exposure to extrinsic stimuli perceived as hazardous. Environ. Health Prev. Med.
2019, 24, 61. [CrossRef] [PubMed]
53. Ahmed, H.; Leyrolle, Q.; Koistinen, V.; Kärkkäinen, O.; Layé, S.; Delzenne, N.; Hanhineva, K. Microbiota-derived metabolites as
drivers of gut-brain communication. Gut Microbes 2022, 14, 2102878. [CrossRef]
54. Walker, A.M.; Cliff, A.; Romero, J.; Shah, M.B.; Jones, P.; Felipe Machado Gazolla, J.G.; Jacobson, D.A.; Kainer, D. Evaluating the
performance of random forest and iterative random forest based methods when applied to gene expression data. Comput. Struct.
Biotechnol. J. 2022, 20, 3372–3386. [CrossRef]
55. Nedaie, A.; Najafi, A.A. Support vector machine with Dirichlet feature mapping. Neural Netw. 2018, 98, 87–101. [CrossRef]
56. Kubat, M. An Introduction to Machine Learning, 1st ed.; Springer: Cham, Switzerland, 2015. [CrossRef]
57. Bhattacharjee, J.; Santra, S.; Deyasi, A. Chapter 10—Novel detection of cancerous cells through an image segmentation approach
using principal component analysis. In Recent Trends in Computational Intelligence Enabled Research; Bhattacharyya, S., Dutta, P.,
Samanta, D., Mukherjee, A., Pan, I., Eds.; Academic Press: Cambridge, MA, USA, 2021; pp. 171–195. [CrossRef]
Biomedicines 2023, 11, 2633 24 of 24
58. Vaibhaw; Sarraf, J.; Pattnaik, P. Chapter 2—Brain–computer interfaces and their applications. In An Industrial IoT Approach for
Pharmaceutical Industry Growth; Balas, V.E., Solanki, V.K., Kumar, R., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 31–54.
[CrossRef]
59. Agatonovic-Kustrin, S.; Beresford, R. Basic concepts of artificial neural network (ANN) modeling and its application in
pharmaceutical research. J. Pharm. Biomed. Anal. 2000, 22, 717–727. [CrossRef]
60. Berrar, D. Performance Measures for Binary Classification. In Encyclopedia of Bioinformatics and Computational Biology; Ranganathan,
S., Gribskov, M., Nakai, K., Schönbach, C., Eds.; Academic Press: Oxford, UK, 2019; pp. 546–560. [CrossRef]
61. Abdolzadegan, D.; Moattar, M.H.; Ghoshuni, M. A robust method for early diagnosis of autism spectrum disorder from EEG
signals based on feature selection and DBSCAN method. Biocybern. Biomed. Eng. 2020, 40, 482–493. [CrossRef]
62. Topçuoğlu, B.D.; Lesniak, N.A.; Ruffin, M.T.; Wiens, J.; Schloss, P.D. A Framework for Effective Application of Machine Learning
to Microbiome-Based Classification Problems. mBio 2020, 11. [CrossRef]
63. Peng, S.; Liu, Y.; Lv, W.; Liu, L.; Zhou, Q.; Yang, H.; Ren, J.; Liu, G.; Wang, X.; Zhang, X.; et al. Deep learning-based artificial
intelligence model to assist thyroid nodule diagnosis and management: A multicentre diagnostic study. Lancet Digit. Health 2021,
3, e250–e259. [CrossRef]
64. Pulikkan, J.; Maji, A.; Dhakan, D.B.; Saxena, R.; Mohan, B.; Anto, M.M.; Agarwal, N.; Grace, T.; Sharma, V.K. Gut Microbial
Dysbiosis in Indian Children with Autism Spectrum Disorders. Microb. Ecol. 2018, 76, 1102–1114. [CrossRef]
65. Liu, X.; Mao, B.; Gu, J.; Wu, J.; Cui, S.; Wang, G.; Zhao, J.; Zhang, H.; Chen, W. Blautia—A new functional genus with potential
probiotic properties? Gut Microbes 2021, 13, 1875796. [CrossRef]
66. Siniscalco, D.; Schultz, S.; Brigida, A.L.; Antonucci, N. Inflammation and Neuro-Immune Dysregulations in Autism Spectrum
Disorders. Pharmaceuticals 2018, 11, 56. [CrossRef]
67. Chung, W.S.F.; Meijerink, M.; Zeuner, B.; Holck, J.; Louis, P.; Meyer, A.S.; Wells, J.M.; Flint, H.J.; Duncan, S.H. Prebiotic potential
of pectin and pectic oligosaccharides to promote anti-inflammatory commensal bacteria in the human colon. FEMS Microbiol.
Ecol. 2017, 93, fix127. [CrossRef]
68. Hul, M.V.; Roy, T.L.; Prifti, E.; Dao, M.C.; Paquot, A.; Zucker, J.D.; Delzenne, N.M.; Muccioli, G.G.; Clément, K.; Cani, P.D. From
correlation to causality: The case of Subdoligranulum. Gut Microbes 2020, 12, 1849998. [CrossRef]
69. Everard, A.; Lazarevic, V.; Derrien, M.; Girard, M.; Muccioli, G.G.; Neyrinck, A.M.; Possemiers, S.; Van Holle, A.; François,
P.; de Vos, W.M.; et al. Responses of Gut Microbiota and Glucose and Lipid Metabolism to Prebiotics in Genetic Obese and
Diet-Induced Leptin-Resistant Mice. Diabetes 2011, 60, 2775–2786. [CrossRef]
70. Cai, Y.Y.; Huang, F.Q.; Lao, X.; Lu, Y.; Gao, X.; Alolga, R.N.; Yin, K.; Zhou, X.; Wang, Y.; Liu, B.; et al. Integrated metagenomics
identifies a crucial role for trimethylamine-producing Lachnoclostridium in promoting atherosclerosis. npj Biofilms Microbiomes
2022, 8, 11. [CrossRef]
71. Zarbock, K.R.; Han, J.H.; Singh, A.P.; Thomas, S.P.; Bendlin, B.B.; Denu, J.M.; Yu, J.P.J.; Rey, F.E.; Ulland, T.K. Trimethylamine
N-Oxide Reduces Neurite Density and Plaque Intensity in a Murine Model of Alzheimer’s Disease. J. Alzheimer’s Dis. JAD 2022,
90, 585–597. [CrossRef] [PubMed]
72. Quan, L.; Yi, J.; Zhao, Y.; Zhang, F.; Shi, X.T.; Feng, Z.; Miller, H.L. Plasma trimethylamine N-oxide, a gut microbe–generated
phosphatidylcholine metabolite, is associated with autism spectrum disorders. NeuroToxicology 2020, 76, 93–98. [CrossRef]
[PubMed]
73. Rothenberg, S.E.; Chen, Q.; Shen, J.; Nong, Y.; Nong, H.; Trinh, E.P.; Biasini, F.J.; Liu, J.; Zeng, X.; Zou, Y.; et al. Neurodevelopment
correlates with gut microbiota in a cross-sectional analysis of children at 3 years of age in rural China. Sci. Rep. 2021, 11, 7384.
[CrossRef] [PubMed]
74. Braniste, V.; Al-Asmakh, M.; Kowal, C.; Anuar, F.; Abbaspour, A.; Tóth, M.; Korecka, A.; Bakocevic, N.; Ng, L.G.; Kundu, P.; et al.
The gut microbiota influences blood-brain barrier permeability in mice. Sci. Transl. Med. 2014, 6, 263ra158. [CrossRef] [PubMed]
75. Hua, X.; Zhu, J.; Yang, T.; Guo, M.; Li, Q.; Chen, J.; Li, T. The Gut Microbiota and Associated Metabolites Are Altered in Sleep
Disorder of Children With Autism Spectrum Disorders. Front. Psychiatry 2020, 11, 855. [CrossRef]
76. Xiao, R.; Chen, H.; Han, H.; Luo, G.; Lin, Y. The in vitro fermentation of compound oral liquid by human colonic microbiota
altered the abundance of probiotics and short-chain fatty acid production. RSC Adv. 2022, 12, 30076–30084. [CrossRef]
77. Engels, C.; Ruscheweyh, H.J.; Beerenwinkel, N.; Lacroix, C.; Schwab, C. The Common Gut Microbe Eubacterium hallii also
Contributes to Intestinal Propionate Formation. Front. Microbiol. 2016, 7, 713. [CrossRef]
78. Chia, L.W.; Mank, M.; Blijenberg, B.; Bongers, R.S.; van Limpt, K.; Wopereis, H.; Tims, S.; Stahl, B.; Belzer, C.; Knol, J. Cross-feeding
between Bifidobacterium infantis and Anaerostipes caccae on lactose and human milk oligosaccharides. Benef. Microbes 2021,
12, 69–83. [CrossRef]
79. Kircher, B.; Woltemate, S.; Gutzki, F.; Schlüter, D.; Geffers, R.; Bähre, H.; Vital, M. Predicting butyrate- and propionate-forming
bacteria of gut microbiota from sequencing data. Gut Microbes 2022, 14, 2149019. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.