Skip to content

JanusX is a user-friendly and high performance software for GWAS using ML, GL and FarmCPU models and GS using GBLUP and ML.

License

Notifications You must be signed in to change notification settings

MaizeMan-JxFU/JanusX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JanusX

English | 简体中文(推荐)

Project Overview

JanusX is a high-performance toolkit for Genome-Wide Association Studies (GWAS) and genomic selection, built on mixed linear models (MLM). It provides significant performance improvements over tools like GEMMA, GCTA, and rMVP, especially in multi-threaded computation.

Development Setup

Installation

git clone https://github.com/MaizeMan-JxFU/JanusX.git
cd JanusX
sh ./install.sh

The install script uses uv for dependency management and creates a virtual environment in .venv/.

Pre-compiled Releases

For convenience, we also provide pre-compiled binaries that don't require building from source. The releases are available at Releases v1.0.0 for the following platforms:

Simply download and extract the archive, then run the executable directly.

Note: Windows installation is no longer supported. Please use Linux/macOS or Windows Subsystem for Linux (WSL). But there is a pre-build version for Windows.

Running the CLI

./jx -h
./jx <module> [options]

Note that running ./jx -h might take a while at first! This is because the Python interpreter is compiling source code into the pycache directory. Subsequent runs will use the pre-compiled code and load much faster!

Available Modules

  • gwas - Mixed linear model GWAS analysis
  • postGWAS - Visualization and annotation
  • transanno - Annotation genome version migration
  • gformat - Genotype format conversion
  • grm - Genetic relationship matrix calculation
  • pca - Principal component analysis

Example Commands

# GWAS with VCF input
jx gwas --vcf example/mouse_hs1940.vcf.gz --pheno example/mouse_hs1940.pheno --out test

# GWAS with PLINK format
jx gwas --bfile genotypes --pheno phenotypes.txt --out results --grm 1 --qcov 3 --thread 8

# Using kinship matrix and fast mode
jx gwas --vcf genotypes.vcf --pheno phenotypes.txt --out results --grm kinship_matrix.txt --qcov 10 --lm

# Visualize GWAS results
jx postGWAS --files test/*.mlm.tsv --threshold 1e-6 --thread 4

# Genomic selection with PLINK format
jx gs --bfile genotypes --pheno phenotypes.txt --out results --GBLUP --rrBLUP --RF

manhanden&qq

Test data in example is from genetics-statistics/GEMMA, published in Parker et al, Nature Genetics, 2016

Architecture

Core Libraries (src/)

  • pyBLUP - Core statistical engine

    • gwas.py - GWAS class implementing mixed linear model with REML optimization
    • QK.py - Q matrix (population structure) and K matrix (kinship) calculation with memory-optimized chunking (deprecated)
    • QK2.py - Alternative QK implementation
    • QC.py - Quality control functions (MAF, missing rate filters)
    • mlm.py - Mixed linear model utilities
    • pca.py - PCA computation (includes randomized SVD for large datasets)
    • kfold.py - Cross-validation utilities
  • gfreader - Genotype file I/O

    • base.py - Readers for VCF, PLINK binary (.bed/.bim/.fam), HapMap, and numpy formats
    • Supports genotype conversion between formats
  • bioplotkit - Visualization

    • manhanden.py - Manhattan and QQ plots
    • LDBlock.py - LD block visualization
    • gffplot.py - GFF/annotation plotting
    • pcshow.py - PCA visualization (uses Plotly)

CLI Entry Points (module/)

Each module corresponds to a CLI command. The launcher script (jx.bat/jx) dispatches to module/<name>.py.

Key Algorithms

Mixed Linear Model: Uses eigen decomposition of the kinship matrix to simplify variance computation, with Brent's method for REML parameter optimization. Lambda (variance ratio) is the single parameter being optimized.

Kinship Methods: VanRanden (Centralization, default), Yang (Standardization).

PCA: Matrix block partitioning for computation.

File Formats

Phenotype file: Tab-delimited, first column is sample ID, subsequent columns is phenotype

samples pheno_name
indv1 value1
indv2 value2

Supported genotype formats: VCF (.vcf, .vcf.gz), PLINK binary (.bed/.bim/.fam), numpy archives (.npz/.snp/.idv)

Python Version

Requires Python 3.8+

Test Data

Example data in example/ directory from Parker et al, Nature Genetics, 2016 (via GEMMA project)

About

JanusX is a user-friendly and high performance software for GWAS using ML, GL and FarmCPU models and GS using GBLUP and ML.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages