JanusX is a high-performance toolkit for Genome-Wide Association Studies (GWAS) and genomic selection, built on mixed linear models (MLM). It provides significant performance improvements over tools like GEMMA, GCTA, and rMVP, especially in multi-threaded computation.
git clone https://github.com/MaizeMan-JxFU/JanusX.git
cd JanusX
sh ./install.shThe install script uses uv for dependency management and creates a virtual environment in .venv/.
For convenience, we also provide pre-compiled binaries that don't require building from source. The releases are available at Releases v1.0.0 for the following platforms:
- Linux AMD64: JanusX-v1.0.0-linux-AMD64.tgz
- Windows AMD64: JanusX-v1.0.0-windows-AMD64.tgz
Simply download and extract the archive, then run the executable directly.
Note: Windows installation is no longer supported. Please use Linux/macOS or Windows Subsystem for Linux (WSL). But there is a pre-build version for Windows.
./jx -h
./jx <module> [options]Note that running ./jx -h might take a while at first! This is because the Python interpreter is compiling source code into the pycache directory. Subsequent runs will use the pre-compiled code and load much faster!
gwas- Mixed linear model GWAS analysispostGWAS- Visualization and annotationtransanno- Annotation genome version migrationgformat- Genotype format conversiongrm- Genetic relationship matrix calculationpca- Principal component analysis
# GWAS with VCF input
jx gwas --vcf example/mouse_hs1940.vcf.gz --pheno example/mouse_hs1940.pheno --out test
# GWAS with PLINK format
jx gwas --bfile genotypes --pheno phenotypes.txt --out results --grm 1 --qcov 3 --thread 8
# Using kinship matrix and fast mode
jx gwas --vcf genotypes.vcf --pheno phenotypes.txt --out results --grm kinship_matrix.txt --qcov 10 --lm
# Visualize GWAS results
jx postGWAS --files test/*.mlm.tsv --threshold 1e-6 --thread 4
# Genomic selection with PLINK format
jx gs --bfile genotypes --pheno phenotypes.txt --out results --GBLUP --rrBLUP --RFTest data in example is from genetics-statistics/GEMMA, published in Parker et al, Nature Genetics, 2016
-
pyBLUP - Core statistical engine
gwas.py- GWAS class implementing mixed linear model with REML optimizationQK.py- Q matrix (population structure) and K matrix (kinship) calculation with memory-optimized chunking (deprecated)QK2.py- Alternative QK implementationQC.py- Quality control functions (MAF, missing rate filters)mlm.py- Mixed linear model utilitiespca.py- PCA computation (includes randomized SVD for large datasets)kfold.py- Cross-validation utilities
-
gfreader - Genotype file I/O
base.py- Readers for VCF, PLINK binary (.bed/.bim/.fam), HapMap, and numpy formats- Supports genotype conversion between formats
-
bioplotkit - Visualization
manhanden.py- Manhattan and QQ plotsLDBlock.py- LD block visualizationgffplot.py- GFF/annotation plottingpcshow.py- PCA visualization (uses Plotly)
Each module corresponds to a CLI command. The launcher script (jx.bat/jx) dispatches to module/<name>.py.
Mixed Linear Model: Uses eigen decomposition of the kinship matrix to simplify variance computation, with Brent's method for REML parameter optimization. Lambda (variance ratio) is the single parameter being optimized.
Kinship Methods: VanRanden (Centralization, default), Yang (Standardization).
PCA: Matrix block partitioning for computation.
Phenotype file: Tab-delimited, first column is sample ID, subsequent columns is phenotype
| samples | pheno_name |
|---|---|
| indv1 | value1 |
| indv2 | value2 |
Supported genotype formats: VCF (.vcf, .vcf.gz), PLINK binary (.bed/.bim/.fam), numpy archives (.npz/.snp/.idv)
Requires Python 3.8+
Example data in example/ directory from Parker et al, Nature Genetics, 2016 (via GEMMA project)