lncMachine (2021): Machine learning tool for genome-wide lncRNA discovery and annotation

📄 Publication: lncMAchine: A machine learning-based approach for genome-wide identification of long noncoding RNAs — Functional & Integrative Genomics, 2021
👤 Role: First author and primary developer — implemented ML pipeline and prediction framework
🎯 Impact: Enables supervised genome-wide lncRNA discovery and annotation with multiple classifiers
Tech: Python 3 • scikit-learn • BioPython • NumPy • pandas

Overview

lncMachine is designed for genome-wide identification and annotation of lncRNAs using supervised machine learning. Users can train models from coding and noncoding sequences and apply prebuilt models for prediction. lncMachine supports multiple classifiers and ensures reproducibility with reference code matching the publication.

Use cases: lncRNA annotation in new genomes • Comparative genomics • Model evaluation and classifier benchmarking

Requirements

Python 3 or newer
scikit-learn version 0.22
BioPython
Numpy
pandas

Installation

Clone the repository and ensure dependencies are installed:

git clone https://github.com/hbusra/lncMAchine.git
cd lncMAchine
pip install -r requirements.txt

Note: Prebuilt models require scikit-learn 0.22 for compatibility.

Example Usage

Train a Random Forest prediction model from coding and noncoding FASTA files:

python3 lncMachine.py -c coding.fasta -n noncoding.fasta --train

Train prediction models with nine ML algorithms:

python3 lncMachine.py -c coding.fasta -n noncoding.fasta --train --all

Train from a CSV feature file:

python3 lncMachine.py -i features.csv --train

Predict coding probability from a FASTA file using a prebuilt model:

python3 lncMachine.py -c test.fasta --model prebuilt_model.sav -o test_predictions.csv

Citation

Cagirici et al., “lncMachine: a machine learning-based approach for genome-wide identification of long noncoding RNAs” Functional & Integrative Genomics, 2021

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
prediction_models		prediction_models
test_data_csv		test_data_csv
README.md		README.md
environment.yml		environment.yml
features.csv		features.csv
lncMachine.py		lncMachine.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lncMachine (2021): Machine learning tool for genome-wide lncRNA discovery and annotation

Overview

Requirements

Installation

Example Usage

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

lncMachine (2021): Machine learning tool for genome-wide lncRNA discovery and annotation

Overview

Requirements

Installation

Example Usage

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages