Mining the probiotics from metagenomic bins primary sequences using the language model
Any question, please do not hesitate to contact me: fangzc@smu.edu.cn
2025.03.02 The link of the virtual machine has been updated; 2024.07.19 The README.md was updated with a few modification; 2024.04.15 The manual was updated with a few modification;
metaProbiotics is designed to identify the probiotics bins from metagenomic data. The tool takes one or more fasta files as input, where each fasta file contains sequences from the same bin, and identifies bins originating from probiotic bacteria. metaProbiotics can be run on the virtual machine, via the docker, or via Matlab interface. For non-computer professionals, we recommend running the virtual machine version of metaProbiotics on local PC. In this way, users do not need to install any dependency package.
- metaProbiotics 1.0 (Tested on Ubuntu 16.04)
Please refer to manual for the detailed metaProbiotics usage.
If you are non-computer professionals who unfamiliar with the Linux operating system, we recommend using the virtual machine of metaProbiotics. In this manner, you don't need to have strong computer skills, especially the ability of using the command line. The installation of the virtual machine is easy. Please refer to the manual for a step by step guide with screenshot to see how to install the vertual machine.When you have installed the virtual machine and set the shared folder, you can go into the metaProbiotics_v_1_0 folder on the desktop, and double click the “metaProbiotics.sh” file.
Then, wait for a few seconds and select your input file and click the "Open" button, and the program will start running.
We also provide a docker version of metaProbiotics docker images have build completedlly, user can download metaProbiotics images as the start several docker containers of metaProbiotics in a manner of mutually independence.
Users who have installed the MATLAB software and are familiar with MATLAB, we recommand you to run metaProbiotics througth the MATLAB script. User could start a MATLAB GUI and directly select the file or folder needed to be predicted.
The output of metaProbiotics consists of three columns:
| Input_file | Score | Is_probiotics |
|---|
The bin with a higher score is more likely to be probiotic. By default, the bin with a score higher than 0.5 will be predicted as probiotic.
- Wu S, Feng T, Tang W, Qi C, Gao J, He X, Wang J, Zhou H, Fang Z. metaProbiotics: a tool for mining probiotic from metagenomic binning data based on a language model. Brief Bioinform. 2024 Jan 22;25(2):bbae085. doi: 10.1093/bib/bbae085. PMID: 38487846; PMCID: PMC10940841.