Faecal Microbiome AI for Disease Diagnosis
Faecal Microbiome AI for Disease Diagnosis
1038/s41467-022-34405-3
Received: 2 September 2022 Qi Su 1,2,3,4,6, Qin Liu 1,2,3,4,6, Raphaela Iris Lau 1,2,3, Jingwan Zhang1,2,3,4,
Zhilu Xu1,2,3,4, Yun Kit Yeoh1, Thomas W. H. Leung2, Whitney Tang1,2,3,
Accepted: 21 October 2022
Lin Zhang1,2,3,4, Jessie Q. Y. Liang 2,3,4, Yuk Kam Yau1,2,3, Jiaying Zheng 1,2,3,
Chengyu Liu1,2,3, Mengjing Zhang1,2,3, Chun Pan Cheung1,2,4,
Jessica Y. L. Ching1,2,3, Hein M. Tun1,3,5, Jun Yu 2,3,4, Francis K. L. Chan1,2,3,4 &
Check for updates Siew C. Ng 1,2,3,4
1234567890():,;
1234567890():,;
Recent studies have shown that imbalanced intestinal microbiota, single-disease diagnostic models are likely to be confounded by
termed “dysbiosis”, contributes to various human diseases1. The cur- unrelated diseases and may lead to misclassification. Although an
rent development of microbial markers has mostly used binary attempt has been made to develop a multi-class diagnostic model,
classifiers2–5. Emerging evidence, however, suggests that most health heterogeneity, technical bias and batch effects involved in the pre-
conditions exhibit overlapping gut microbiome signatures6, thus vious work relying on public datasets for analyses would limit
1
Microbiota I-Center (MagIC), Hong Kong SAR, China. 2Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Hong Kong
SAR, China. 3Li Ka Shing Institute of Health Sciences, State Key Laboratory of Digestive Disease, Institute of Digestive Disease, The Chinese University of Hong
Kong, Hong Kong SAR, China. 4Center for Gut Microbiota Research, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China. 5JC
School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong SAR, China. 6These authors contributed equally: Qi Su, Qin Liu.
e-mail: siewchienng@cuhk.edu.hk
accuracy7. Here, we develop the largest single-site dataset to date class proportions as the cohort) and presented their final performance
covering multiple diseases, adopt a machine learning multi-class from the withheld test set (30% samples, Fig. 1a, see the “Methods”
model to predict different diseases using species-level faecal micro- section). All these models achieved a mean AUROC of 0.67–0.99
biome profiling, and validate the findings using public metagenome (Interquartile range, IQR 0.81–0.92), suggesting that multi-class dis-
datasets across different populations. ease classification based on the faecal microbiome was feasible
(Source Data, Supplementary Fig. 3). Amongst them, the RF multi-class
Results model achieved a mean AUROC of 0.90–0.99 (IQR 0.91–0.94, one
We performed metagenomic sequencing of faecal samples from 2320 versus all others, Fig. 1b) for different disease phenotypes in the test
Hong Kong Chinese (mean age 54.9, 48.7% female, Source Data, Sup- set. The performance of the RF model in the test set significantly
plementary Fig. 1a, see the “Methods” section) consisting of 9 well- outperformed all other models (Source Data, Supplementary Fig. 3b)
characterised disease phenotypes: colorectal cancer (CRC, n = 174), and was similar to that of the training set (calculated by 5-fold cross-
colorectal adenomas (CA, n = 168), Crohn’s disease (CD, n = 200), validation, Source Data, Supplementary Fig. 3c), suggesting high
ulcerative colitis (UC, n = 147), irritable bowel syndrome (diarrhoea integrity of this classifier. Therefore, the RF multi-class model was used
subtype, IBS-D, n = 145), obesity (n = 148), cardiovascular disease (CVD, for further analyses. At a threshold based on the highest Youden’s
n = 143), post-acute COVID-19 syndrome (PACS, n = 302) and healthy Index, the sensitivities of our RF multi-class classifier ranged from 0.81
controls (n = 893). In total, we obtained 14.3 terabytes of the sequence to 0.95 (IQR 0.87–0.93) at specificities of 0.76 to 0.98 (IQR 0.83–0.95)
at an average depth of 6.15 gigabases for each metagenome and for different diseases with accuracy from 0.77 to 0.98 (IQR 0.82–0.92,
identified 1208 bacterial species. Amongst them, 325 bacterial species one versus all others, Fig. 1c), highlighting good diagnostic perfor-
had a relative abundance higher than 0.15% and these species were mance. For example, our classifier achieved a mean AUROC of 0.94 for
present in over 5% of the subjects (Source Data). CRC with a sensitivity of 0.88 at a specificity of 0.85 (accuracy 0.85, one
versus all others, Fig. 1b, c); this performance was superior to that of
Shared microbiome signatures across different phenotypes our trained binary classifier (CRC versus health, mean AUROC 0.91,
We observed differences in bacterial diversity (Shannon) and richness Source Data, Supplementary Fig. 2c) and a previously published CRC
(number of species) in different diseases, and we found that both diagnostic model2. Further assessment using predicted probabilities in
indices vary across phenotypes (Source Data, Supplementary the test set showed that the trained classifier achieved a mean AUROC
Fig. 1b, c). These results are consistent with a recent meta-analysis8, of 0.94 for all one versus one classifications (IQR 0.92–0.98, Source
indicating that ecological indices may not be robust indicators of Data, Supplementary Fig. 4a) with high sensitivities (IQR 0.88–0.95)
health or disease. Then, we explored associations of microbial com- and specificities (IQR 0.83–0.94, Source Data, Supplementary Fig. 4b),
position at the species level with disease phenotypes using a linear which supported a superior performance of multi-class model analyses
model of MaAsLin2 after adjusting for biological and technical con- over binary models (Source Data, Supplementary Fig. 2c).
founders (see the “Methods” section). We found a total of 1061 sig- To fully characterise the RF multi-class model, we compared its
nificant associations between these nine phenotypes and 215 bacterial performance under different split ratios and achieved similar results,
taxa at the species level (FDR < 0.05). Amongst the 215 species, more suggesting high stability and good predictive power without risk of
than 94% were significantly associated with two or more diseases, overfitting (Source Data, Supplementary Fig. 3d). Given that subjects
which is consistent with previous works that numerous signals are with CRC or colorectal adenomas were older than other subjects
shared among different diseases6,9 (Source Data, Supplementary (Source Data, Supplementary Fig. 1a), we assessed our model stratified
Fig. 1d). For instance, Klebsiella pneumoniae, a well-characterised by age and found consistent performance (Source Data, Supplemen-
opportunistic pathogen10, was positively associated with CD, CRC, IBS- tary Fig. 5). In addition, our model achieved a mean AUROC of 0.87 in
D, Obesity, PACS and UC in our cohort, whilst Roseburia intestinalis, a distinguishing CRC and colorectal adenomas (Source Data, Supple-
promising probiotic with butyrate-producing properties11, negatively mentary Fig. 4), which supported that effect of age on the model was
correlated with these six disease phenotypes (Source Data, Supple- likely to be negligible. To rule out the possibility that uneven cohort
mentary Fig. 1d). Next, we found that both PCoA analysis based on sizes across different diseases may influence the classification perfor-
beta-diversity and random forest (RF) binary classifier could sig- mance, we trained a separate RF multi-class classifier by randomly
nificantly separate all disease phenotypes (Source Data, Supplemen- pooling 143 subjects from each disease phenotype (a total of
tary Table 1, Supplementary Fig. 2a–c, all p < 0.001). Whilst common 1287 subjects, 70% training, 30% testing) and found an AUROC of
microbial signatures were shared across diseases, these findings 0.83–0.99 (one versus all others, IQR 0.89–0.96; one versus one, IQR
pointed to the presence of disease-specific microbial composition. 0.89–0.97; Source Data, Supplementary Fig. 6) which was comparable
However, it is unknown whether binary classifiers can capture these to the AUROC of 0.90–0.99 in the 2,320 individuals (one versus all
disease-specific signatures. Therefore, we tested the specificity of our others, IQR 0.91–0.94; one versus one, IQR 0.92–0.98; Fig. 1b, Source
trained binary models in unrelated diseases, and the results showed a Data, Supplementary Fig. 4). Importantly, the AUROC values of the
high misdiagnosis rate (average 0.52, IQR 0.41–0.65, Source Data, model increased with the increasing number of features which sug-
Supplementary Fig. 2d). These results suggested that the binary clas- gested again that overfitting based on the 325 selected features was
sifier failed to capture real disease-specific features based solely on unlikely (Source Data, Supplementary Fig. 7).
single disease versus control samples.
Validation of multi-class model on independent datasets
Development of faecal microbiome-based multi-class diag- Then, we integrated 1597 shotgun faecal metagenome data from 12
nosis model public datasets from Asia, Europe and North America (Source Data,
Classification tasks in machine learning involving more than two Supplementary Table 2, Supplementary Fig. 8a). Our RF multi-class
classes are known as “multi-class classification”, which can effectively classifier showed a mean AUROC of 0.69–0.91 (IQR 0.79–0.87, Source
account for confounding effects of unrelated classes12. Based on our Data, Supplementary Table 3) in classifying different diseases, and
cohort of 2320 Hong Kong Chinese, we trained five machine learning generally outperformed all other models (Source Data, Supplementary
multi-class classifiers (RF, K-nearest neighbours (KNN), multi-layer Fig. 8b). Such performance from an independent validation cohort
perceptron (MLP), support vector machine (SVM), and graph con- further confirmed the robustness and generalisability of our model
volutional neural network (GCN)) to classify different diseases using across different populations and geographical locations. To further
species-level data from the training set (70% samples with the same validate the accuracy of our model, we selected 60 patients who had a
a
Training 325 bacterial species
Fecal microbiome
n=2320 30%
Repeats (n=20)
SVM KNN RF MLP GCN Microbiome Profiling
70% 1100101111010101
1100011011100110
Optimal
1011000010101110
30% Validation
……
Health, n=893
CA, n=168
CD, n=200
CRC, n=174
CVD, n=143
IBS-D, n=145
Obesity, n=148
PACS, n=302
UC, n=147
…… Training
Probabilities
UC CD
Test CVD
Test 30% Health CRC
Trained Multi-class Classifier CA
PACS IBS-D
Obesity
PACS (AUC=0.98, 95%CI 0.97-0.99) CVD 0.06786 0.87 0.77 0.78 0.64
Health (AUC=0.91, 95%CI 0.90-0.92)
CA (AUC=0.90, 95%CI 0.89-0.92) IBS-D 0.12423 0.94 0.98 0.98 0.93
CD (AUC=0.93, 95%CI 0.91-0.94)
CRC (AUC=0.94, 95%CI 0.93-0.96) Obesity 0.10805 0.88 0.95 0.94 0.82
CVD (AUC=0.91, 95%CI 0.89-0.93)
IBS-D (AUC=0.99, 95%CI 0.98-0.99) PACS 0.15348 0.95 0.92 0.92 0.87
Obesity (AUC=0.92, 95%CI 0.89-0.96)
UC (AUC=0.93, 95%CI 0.91-0.94) UC 0.09891 0.86 0.86 0.86 0.72
1 - Specificity
Fig. 1 | Faecal microbiome-based machine learning for multi-class disease vector machine, KNN K-nearest neighbours, RF random forests; MLP multi-layer
diagnosis. a Framework for dataset partition, model training and independent perceptron, GCN graph convolutional neural network, CA colorectal adenomas, CD
validation. b Area under the receiver operating characteristic curve (AUROC, centre Crohn’s disease, CRC colorectal cancer, CVD cardiovascular disease, IBS-D diar-
for the error bands is median). c Performance metric details of the trained random rhoea-dominant irritable bowel syndrome, PACS post-acute COVID-19 syndrome,
forest multi-class classifier for classifying one phenotype from all others using UC ulcerative colitis. Source data are provided as a Source Data file.
species-level faecal microbiome data in the independent test set. SVM support
complete recovery from COVID-19 infection. Our trained model bacterial species achieved a mean AUROC of 0.88 to 0.99 (IQR
showed an accuracy of 83.3% (50/60) in classifying these subjects as 0.90–0.93, Source Data, Supplementary Fig. 9a) for different diseases
healthy (Source Data, Supplementary Fig. 8c). These data verified that in our test set, and a mean AUROC of 0.67 to 0.90 (IQR 0.78–0.86,
fully recovered COVID-19 survivors (and without PACS) shared similar Source Data, Supplementary Fig. 9b) in the public dataset. A total of
gut microbiome profiles as healthy people13. Additionally, we also 363 significant associations were found between these 50 species with
tested our trained RF model on diseases not included in our training different disease phenotypes (Hong Kong cohort, FDR < 0.05, Fig. 2).
dataset, including liver cirrhosis and constipation-dominant IBS data- Compared with healthy controls, almost all disease states were asso-
sets (n = 60, see the “Methods” section). We found that using our RF ciated with a significantly decreased abundance of microbiota from
multi-class model there were high probabilities whereby prediction the bacteria phylum of Firmicutes or Actinobacteria (FDR < 0.05) and a
cannot be made as they failed the corresponding threshold for most significant increase in Bacteroidetes (FDR < 0.05). Imbalance in
subjects (48/60, Source Data, Supplementary Fig. 8d), and they might Firmicutes/Bacteroidetes ratio had previously been reported primarily
be categorised as undetermined. And, the misclassification rate for in patients with obesity and IBD14, but its associations with other dis-
each phenotype is from 0% (0/60, CA, CVD, IBS-D, Obesity) to 5% (3/ eases have not been reported. Nonetheless, such shared microbial
60, CD, CRC, PACS, Source Data, Supplementary Fig. 9d), suggesting signatures may serve as a basis for distinguishing health and disease.
that our model has a high specificity and accuracy for the nine phe- Then, we identified specific microbial signatures that can classify dif-
notypes within our cohort with a low risk of misclassification for ferent diseases (Fig. 2). Specifically, the abundance of several bacterial
unrelated diseases. species in Bacteroidetes differed significantly between patients with
PACS, UC and CD. Subjects with PACS showed a significant increase in
Associations between bacterial features and phenotypes abundance of Bacteroides vulgatus and Bacteroides xylanisolvens, while
Next, we correlated the top 50 bacterial species contributing to the those with UC were enriched in Bacteroides ovatus, and subjects with
model (Source Data, Supplementary Table 4) with different disease CD showed significant decreases in Bacteroides uniformis, Bacteroides
phenotypes to identify clues to model interpretability. These top 50 vulgatus and Bacteroides xylanisolvens, compared with healthy
Fig. 2 | Microbial species associated with health status or different disease correlations), respectively. The nominal significance (p-value) of associations was
phenotypes. The top 50 microbial species contributing to the random forest multi- calculated by MaAsLin 2, and the false discovery rate (FDR) was computed by
class classifier were clustered by taxonomy, and different phenotypes were clus- Benjamini–Hochberg correction. CA colorectal adenomas, CD Crohn’s disease,
tered using hierarchical clustering. Associations were coloured by direction of CRC colorectal cancer, CVD cardiovascular disease, IBS-D diarrhoea-dominant
effect (red, positive; blue, negative; p < 0.05), with associations significant at irritable bowel syndrome, PACS post-acute COVID-19 syndrome, UC ulcerative
FDR < 0.05 marked with a plus (positive correlations) or minus (negative colitis. Source data are provided as a Source Data file.
controls. Although patients with CRC and colorectal adenomas shared Supplementary Fig. 9c). Overall, these results suggest that our model
relatively similar gut bacteria composition, the abundance of Parvi- can capture various disease-specific microbial signatures, which may
monas micra was significantly higher in patients with CRC but not explain the robust diagnostic performance of this multi-class classifier.
colorectal adenomas, compared to healthy controls, which was con-
sistent with previous findings showing that Parvimonas micra can be Discussion
used as a marker to distinguish CRC from colorectal adenomas15,16. For Overall, our data showed that the faecal microbiome-based multi-class
other diseases, microbiome differences were mainly driven by Acti- model for disease diagnosis is feasible. The novelty lies in the high-
nobacteria. Subjects with obesity showed increases in Actinomyces quality dataset, and superior and reproducible machine-learning
naeslundii, Actinomyces odontolyticus and Actinomyces oris, and sub- methods which are of high clinical relevance. We believe this multi-
jects with IBS-D showed increases in Collinsella aerofaciens and Col- class model of classifying diseases has potential clinical applications
linsella stercoris. We further correlated bacteria and phenotypes in the and can serve as a non-invasive way of screening various diseases in
assembled public dataset, and found that many disease-specific bio- clinical practice or for disease risk assessment. Our results also have
marker are stable across datasets, such as Bacteroides for UC, Parvi- implications for the potential development of biomarkers for pre-
monas micra for CRC and Actinomyces for obesity (Source Data, dicting drug response and common treatment strategies using the
identified shared or specific marker for multiple diseases. This work Healthy controls were recruited during the same recruitment
has some limitations. Firstly, the disease spectrum of this study is still period from the community through advertisement and from the
limited, and the inclusion of more phenotypes can further enhance the endoscopy centre at the Prince of Wales Hospital and included sub-
value of this multi-class diagnostic tool. Secondly, biological evidence jects who had a normal colonoscopy (faecal samples collected before
to support the identified microbiome–phenotype associations is lim- bowel preparation). The exclusion criteria for healthy controls were
ited and future work to delineate the mechanisms of these associations known complex infections or sepsis; known history of severe organ
is needed to facilitate our understanding of the role of the shared and failure (including decompensated cirrhosis, malignant disease, kidney
disease-specific microbiome in disease pathogenesis. Also, the pooled failure, epilepsy, active serious infection, acquired immunodeficiency
public dataset did not specify co-morbidities and antibiotic use, thus syndrome); bowel surgery in the last 6 months (excluding colono-
model performance may vary upon the exclusion of these subjects. scopy/procedure related to perianal disease); the presence of an
Since our model predicts probabilities for multiple diseases simulta- ileostomy/stoma; and current pregnancy; any long term drugs for
neously, it may also apply to multi-disease diagnosis in a single patient. chronic diseases; the use of antibiotics in the last 3 months; the use of
Though we could not validate it at this moment, this hypothesis should laxatives or anti-diarrhoeal drugs in the last 3 months or recent dietary
be tested in the future. changes (e.g., becoming vegetarian/vegan). Finally, a total of
To our knowledge, we present the largest faecal microbiome 2320 subjects were recruited. Clinical metadata and dietary data were
datasets with different disease phenotypes and developed a machine collected during clinical interviews. Besides, an additional 60 subjects
learning multi-class model that achieved high performance for disease (mean age 53.5, 48.3% female) were prospectively followed up for up to
classification. This non-invasive microbiome-based model could two years after the COVID-19 infection and were confirmed to have
potentially be applied clinically to complement disease diagnostics fully recovered from the initial infection without any symptoms of
and treatment response monitoring. PACS. These subjects served as an independent validation cohort and
provided serial faecal samples after SARS-CoV-2 clearance.
Methods
Ethics statement Faecal samples
The study was approved by The Joint Chinese University of Hong Kong Faecal samples were collected at home by all subjects using tubes
– New Territories East Cluster Clinical Research Ethics Committee (The prepared by investigators containing preservative media (cat. 63700,
Joint CUHK-NTEC CREC). All subjects provided written informed Norgen Biotek Corp, Ontario Canada). The Norgen preservative can
consent. preserve and allow safe transportation of microbial DNA & RNA at
ambient temperature eliminating sample variability. The stool sam-
Study population ple was sent to the hospital within 24 h of collection and stored at
All participants were recruited and diagnosed at the Prince of Wales −80 °C refrigerators until further processing. We have previously
Hospital in Hong Kong from January 2017 to March 2022. Subjects with shown that data on gut microbiota composition generated from
CRC and CA were diagnosed by colonoscopy and confirmed on his- faecal samples collected using this preservative medium was com-
tology examinations; Subjects with CD and UC were diagnosed based parable to data obtained from fresh samples that were immediately
on standard criteria of endoscopy, radiology, and histological exam- stored at −80 °C18.
inations. Subjects with IBS were diagnosed according to the ROME III
criteria, and endoscopy and enteroscopy were performed to exclude Faecal DNA extraction and sequencing
other GI disorders such as IBD, coeliac disease, parasite infestations, or After removing the preservative media, microbial DNA was isolated
other organic disorders. Obesity was defined as subjects with a body with the Qiagen (Hilden, Germany) QIAamp DNA Stool Mini Kit,
mass index (BMI) of over 28 and with no other medical co-morbidities. according to the manufacturer’s instructions. After the quality control
Subjects with cardiovascular disease (CVD) were recruited from the procedures by Qubit 2.0, agarose gel electrophoresis, and Agilent
public as part of a survey of cardiovascular health in the Hong Kong 2100, extracted DNA was subject to DNA libraries construction, com-
general population. Subjects underwent carotid ultrasounds to mea- pleted through the processes of end repairing, adding A to tails, pur-
sure intima-media thickness (IMT) of the common, internal, and ification and PCR amplification, using Nextera DNA Flex Library
external carotid arteries (CCA, ICA and ECA, respectively) and carotid Preparation kit (Illumina, San Diego, CA). Libraries were subsequently
bulbs and subjects that had ≥50% stenosis in a single or multiple ves- sequenced on our in-house sequencer Illumina NextSeq 550 (150 base
sels were regarded as having the risk of CVD. Subjects with post-acute pairs paired-end) at the Center for Microbiota Research, The Chinese
covid-19 syndrome (PACS) were defined as those with at least one University of Hong Kong. All samples were in random order for DNA
persistent symptom or long-term complications of SARS-CoV-2 infec- extraction, library construction and sequencing. ZymoBIOMICS Spike-
tion beyond 4 weeks from the viral clearance which could not be in Control I (High Microbial Load, Cat: D6320-10, ZYMO Research,
explained by an alternative diagnosis, and we assessed the presence of USA) and ZymoBIOMICS Microbial Community DNA Standard (Cat:
the 30 most commonly reported symptoms post-COVID after illness D6306-A) were used as positive controls during DNA extraction,
onset13,17 (Source Data, Supplementary Table 5). All subjects with other library construction, sequencing and quality assessment.
diseases (apart from the obesity group) had a normal range of BMI of
18.5–22.9. All subjects are on a stable traditional Chinese style diet and Microbiome profiling
are of Han Chinese ethnicity. Patients were excluded if they had the Raw sequence data were quality filtered using Trimmomatic V.39 to
following: age under 18 or over 80; self-reported comorbidities of remove the adaptor, low-quality sequences (quality score < 20), and
other diseases; infection with an enteric pathogen; acquired immu- reads shorter than 50 base pairs. Contaminating human reads were
nodeficiency syndrome; known history of organ dysfunction or failure filtered using Kneaddata (V.0.10.0, Reference database: GRCh38 p12)
and abdominal surgery; active malignancy or undergoing radio-che- with default parameters. Following this, microbiota composition pro-
motherapy; short bowel syndrome; taking drugs commonly known to files were inferred from quality-filtered forward reads using MetaPh-
affect the gut microbiome including proton pump inhibitors, oral anti- lAn3 version 3.0.14. GNU parallel (v2018) was used for parallel analysis
diabetics, non-steroidal anti-inflammatory drugs, corticosteroids, jobs to accelerate data processing. Species whose average abundance
laxatives or selective serotonin reactive inhibitors and antibiotics or and prevalence were <0.15% and 5% were filtered out. Alpha diversity
probiotics use within three months of sample collection; pregnant or metrics (Shannon diversity, Chao1 richness) were calculated by using
breastfeeding; on special diets such as vegetarians. the phyloseq package (v1.26.0).
Microbiome analysis proportions across folds). The optimal models selected based on
All statistical analyses were done using R version 4.0.3. The ggpubr cross-validated results were evaluated in the withheld evaluation
package (https://github.com/kassambara/ggpubr) performed non- dataset as the final performance for predicting different diseases. This
parametric statistical testing between groups and accounted for mul- process was repeated 20 times to obtain a distribution of random
tiple hypothesis testing corrections when necessary. Principal forest prediction evaluations on the validation set, and the mean
coordinates analysis (PCoA) based on beta-diversity (Bray–Curtis dis- AUROC and AUPR value was calculated accordingly for the visualisa-
tance matrix calculated using relative abundances of microbial spe- tion of results. The highly ranked and frequently selected microbial
cies) was used to visualise the clustering of samples based on their features were considered predictive signatures for further interpreta-
species-level compositional profiles. The microbiome composition tion. We retrieved prediction performance using the same training
differences between different phenotypes were calculated by permu- datasets.
tational multivariate analysis of variance (PERMANOVA) using distance
matrices (adonis) in the adonis function of vegan R package V.2.5–7 Model evaluation
with 999 permutations. Associations of specific microbial species with We included AUROC to characterise the model performance as our
phenotypes were identified using the multivariate analysis by linear models initially provided outputs of probabilities for each disease
models (MaAsLin2) statistical frameworks implemented in the Hut- phenotype, and these predicted probabilities were then used to
tenhower Lab Galaxy instance (http://huttenhower.sph.harvard.edu/ estimate the risk of disease occurrence or absence, which formed a
galaxy/) with healthy controls as reference. The linear model also binary status that was analysed to provide an AUROC value. The
included age, sex and technical factors (library DNA concentration, AUROC is a widely applied metric that considers the trade-offs
sequencing read depth, sequencing batch) to further correct for between sensitivity and specificity at all possible thresholds for
potential batch effects and confounders. BMI was not included as apart comparing the performance across various classifiers with a baseline
from the obese group, all subjects from other disease groups had a value of 0.5 for a random classifier. AUPR was provided as a com-
normal BMI requirement (18.5–22.9) and there was no difference in the plimentary assessment, which considers the trade-offs between
BMI across different phenotypes. Benjamini–Hochberg correction was precision (or positive predictive value) and recall (or sensitivity) with
used to control for multiple testing, and results were considered sig- a baseline that equals the proportion of positive disease cases in all
nificant at false discovery rate (FDR) < 0.05. samples.
Reporting summary 17. Nalbandian, A. et al. Post-acute COVID-19 syndrome. Nat. Med. 27,
Further information on research design is available in the Nature 601–615 (2021).
Research Reporting Summary linked to this article. 18. Chen, Z. et al. Impact of preservation method and 16S rRNA
hypervariable region on gut microbiota profiling. mSystems 4,
Data availability e00271–00218 (2019).
The raw metagenomes generated in this study have been deposited in 19. Chen, C. et al. Removing batch effects in analysis of expression
the NCBI Sequence Read Archive database under accession code microarray data: an evaluation of six batch adjustment methods.
PRJNA841786. The public available raw sequencing data were down- PLoS ONE 6, e17238 (2011).
loaded through the NCBI Sequence Read Archive using the retrieved 20. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J.
accession numbers from cited papers, including DRA006684, Mach. Learn. Res. 12, 2825–2830 (2011).
DRA008156, ERP008729, ERP005534, ERP023788, ERP021923, 21. Franzosa, E. A. et al. Gut microbiome structure and metabolic
PRJEB36140, PRJEB37924, PRJEB33500, PRJNA400072, PRJEB1220, activity in inflammatory bowel disease. Nat. Microbiol. 4,
PRJNA429990, PRJEB1220, PRJNA429990, PRJEB15371, and PRJEB6337. 293–305 (2019).
The reference database GRCh38.p12 was downloaded from https:// 22. Nielsen, H. B. et al. Identification and assembly of genomes
www.ncbi.nlm.nih.gov/assembly/GCF_000001405.38. Source data are and genetic elements in complex metagenomic samples
provided with this paper. without using reference genomes. Nat. Biotechnol. 32, 822–828
(2014).
Code availability 23. Weng, Y. J. et al. Correlation of diet, microbiota and metabolite
Codes and scripts developed in this study are all available at the GitHub networks in inflammatory bowel disease. J. Dig. Dis. 20,
repository (https://github.com/qsu123/multi_class_diagnosis34). 447–459 (2019).
24. He, Q. et al. Two distinct metacommunities characterize the
References gut microbiota in Crohn’s disease patients. Gigascience 6, 1–11
1. Lynch, S. V. & Pedersen, O. The human intestinal microbiome in (2017).
health and disease. N. Engl. J. Med. 375, 2369–2379 (2016). 25. Jie, Z. et al. The gut microbiome in atherosclerotic cardiovascular
2. Liang, J. Q. et al. A novel faecal Lachnoclostridium marker for the disease. Nat. Commun. 8, 845 (2017).
non-invasive diagnosis of colorectal adenoma and cancer. Gut 69, 26. Yachida, S. et al. Metagenomic and metabolomic analyses reveal
1248–1257 (2020). distinct stage-specific phenotypes of the gut microbiota in color-
3. Vila, A. V. et al. Gut microbiota composition and functional changes ectal cancer. Nat. Med. 25, 968–976 (2019).
in inflammatory bowel disease and irritable bowel syndrome. Sci. 27. Feng, Q. et al. Gut microbiome development along the colorectal
Transl. Med. 10, eaap8914 (2018). adenoma-carcinoma sequence. Nat. Commun. 6, 6528 (2015).
4. Shaukat, A. & Levin, T. R. Current and future colorectal cancer 28. Zeller, G. et al. Potential of fecal microbiota for early-stage detec-
screening strategies. Nat. Rev. Gastroenterol. Hepatol. 19, tion of colorectal cancer. Mol. Syst. Biol. 10, 766 (2014).
521–531 (2022). 29. Vervier, K. et al. Two microbiota subtypes identified in irritable
5. Pasolli, E., Truong, D. T., Malik, F., Waldron, L. & Segata, N. Machine bowel syndrome with distinct responses to the low FODMAP diet.
learning meta-analysis of large metagenomic datasets: tools and Gut. 71, 1821–1830 (2022).
biological insights. PLoS Comput. Biol. 12, e1004977 (2016). 30. Goll, R. et al. Effects of fecal microbiota transplantation in subjects
6. Gacesa, R. et al. Environmental factors shaping the gut microbiome with irritable bowel syndrome are mirrored by changes in gut
in a Dutch population. Nature 604, 732–739 (2022). microbiome. Gut Microbes 12, 1794263 (2020).
7. Saad Khan, L. K. Multiclass disease classification from microbial 31. Mars, R. A. T. et al. Longitudinal multi-omics reveals subset-specific
wholecommunity metagenomes. Pac. Symp. Biocomput. 25, mechanisms underlying irritable Bowel syndrome. Cell 182,
55–66 (2020). 1460–1473 e1417 (2020).
8. Gupta, V. K. et al. A predictive index for health status using species- 32. Meslier, V. et al. Mediterranean diet intervention in overweight and
level gut microbiome profiling. Nat. Commun. 11, 4635 (2020). obese subjects lowers plasma cholesterol and causes changes in
9. Duvallet, C., Gibbons, S. M., Gurry, T., Irizarry, R. A. & Alm, E. J. Meta- the gut microbiome and metabolome independently of energy
analysis of gut microbiome studies identifies disease-specific and intake. Gut 69, 1258–1268 (2020).
shared responses. Nat. Commun. 8, 1784 (2017). 33. Qin, N. et al. Alterations of the human gut microbiome in liver cir-
10. Wyres, K. L., Lam, M. M. C. & Holt, K. E. Population genomics of rhosis. Nature 513, 59–64 (2014).
Klebsiella pneumoniae. Nat. Rev. Microbiol. 18, 344–359 (2020). 34. Su, Q. Faecal microbiome-based machine learning for multi-class
11. Nie, K. et al. Roseburia intestinalis: a beneficial gut organism from disease diagnosis. Github https://doi.org/10.5281/zenodo.
the discoveries in genus and species. Front. Cell. Infect. Microbiol. 7193183 (2022).
11, 757718 (2021).
12. Grandini, M., E. Bagli, E. & Visani, G. Metrics for multi-class classi- Acknowledgements
fication: an overview. Preprint at arXiv.2008.05756 (2020). We thank Gabriel Lee for manuscript proofreading. We thank Anki Miu,
13. Liu, Q. et al. Gut microbiota dynamics in a prospective cohort of Bonaventure YM Ip, Joyce Wing Yan Mak, Paul KS Chan and other clinical
patients with post-acute COVID-19 syndrome. Gut 71, research staff/students for their technical contribution to this study,
544–552 (2022). including clinical data and sample collection, inventory and processing.
14. Stojanov, S., Berlec, A. & Štrukelj, B. The influence of probiotics on This research has been conducted using the CU-Med Biobank Resource
the Firmicutes/Bacteroidetes ratio in the treatment of obesity and under Request ID ‘R20221008’. Q.S., Q.L., J.Z., Z.X., Y.K.Y., W.T., L.Z.,
inflammatory bowel disease. Microorganisms 8, 1715 (2020). Y.K.Y., C.L., M.Z., C.P.C., H.M.T., F.K.L.C. and S.C.N are partially or fully
15. Xu, J. et al. Alteration of the abundance of Parvimonas micra in the supported by InnoHK, The Government of Hong Kong, Special Admin-
gut along the adenoma-carcinoma sequence. Oncol. Lett. 20, istrative Region of the People’s Republic of China. S.C.N. is also sup-
106 (2020). ported by the Croucher Senior Medical Research Fellowship. R.I.L.
16. Lowenmark, T. et al. Parvimonas micra as a putative non-invasive received additional support from the Hong Kong Ph.D. Fellowship
faecal biomarker for colorectal cancer. Sci. Rep. 10, 15250 (2020). Scheme (HKPFS).