Skip to content

JanusX is a high-performance, ALL-in-ONE suite for quantitative genetics that unifies genome-wide association studies (GWAS) and genomic selection (GS).

License

Notifications You must be signed in to change notification settings

MaizeMan-JxFU/JanusX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JanusX

English | 简体中文(推荐)

Project Overview

JanusX is a high-performance, ALL-in-ONE suite for quantitative genetics that unifies genome-wide association studies (GWAS) and genomic selection (GS). It incorporates well-established GWAS methods (MLM, GLM and FarmCPU) and a flexible GS toolkit including GBLUP and various machine learning models.It also combines routine genomic analyses, from data processing to publication-ready visualisation.

It provides significant performance improvements over tools like GEMMA, GCTA, and rMVP, especially in multi-threaded computation.

Development Setup

Installation

git clone https://github.com/MaizeMan-JxFU/JanusX.git
cd JanusX
sh ./install.sh

The install script uses uv for dependency management and creates a virtual environment in .venv/.

Pre-compiled Releases

For convenience, we also provide pre-compiled binaries that don't require building from source. The releases are available at Releases v1.0.0 for the following platforms:

Simply download and extract the archive, then run the executable directly.

Note: Windows installation is no longer supported. Please use Linux/macOS or Windows Subsystem for Linux (WSL). But there is a pre-build version for Windows.

Running the CLI

./jx -h
./jx <module> [options]

Note that running ./jx -h might take a while at first! This is because the Python interpreter is compiling source code into the pycache directory. Subsequent runs will use the pre-compiled code and load much faster!

Available Modules

  • gwas - Mixed linear model GWAS analysis
  • postGWAS - Visualization and annotation
  • transanno - Annotation genome version migration
  • gformat - Genotype format conversion
  • grm - Genetic relationship matrix calculation
  • pca - Principal component analysis

Example Commands

# GWAS with VCF input
jx gwas --vcf example/mouse_hs1940.vcf.gz --pheno example/mouse_hs1940.pheno --out test

# GWAS with PLINK format
jx gwas --bfile genotypes --pheno phenotypes.txt --out results --grm 1 --qcov 3 --thread 8

# Using kinship matrix and fast mode
jx gwas --vcf genotypes.vcf --pheno phenotypes.txt --out results --grm kinship_matrix.txt --qcov 10 --lm

# Visualize GWAS results
jx postGWAS --files test/*.mlm.tsv --threshold 1e-6 --thread 4

# Genomic selection with PLINK format
jx gs --bfile genotypes --pheno phenotypes.txt --out results --GBLUP --rrBLUP --RF

manhanden&qq

Test data in example is from genetics-statistics/GEMMA, published in Parker et al, Nature Genetics, 2016

Architecture

Core Libraries (src/)

  • pyBLUP - Core statistical engine

    • gwas.py - GWAS class implementing mixed linear model with REML optimization
    • QK.py - Q matrix (population structure) and K matrix (kinship) calculation with memory-optimized chunking (deprecated)
    • QK2.py - Alternative QK implementation
    • QC.py - Quality control functions (MAF, missing rate filters)
    • mlm.py - Mixed linear model utilities
    • pca.py - PCA computation (includes randomized SVD for large datasets)
    • kfold.py - Cross-validation utilities
  • gfreader - Genotype file I/O

    • base.py - Readers for VCF, PLINK binary (.bed/.bim/.fam), HapMap, and numpy formats
    • Supports genotype conversion between formats
  • bioplotkit - Visualization

    • manhanden.py - Manhattan and QQ plots
    • LDBlock.py - LD block visualization
    • gffplot.py - GFF/annotation plotting
    • pcshow.py - PCA visualization (uses Plotly)

CLI Entry Points (module/)

Each module corresponds to a CLI command. The launcher script (jx.bat/jx) dispatches to module/<name>.py.

Key Algorithms

Mixed Linear Model: Uses eigen decomposition of the kinship matrix to simplify variance computation, with Brent's method for REML parameter optimization. Lambda (variance ratio) is the single parameter being optimized.

Kinship Methods: VanRanden (Centralization, default), Yang (Standardization).

PCA: Matrix block partitioning for computation.

File Formats

Phenotype file: Tab-delimited, first column is sample ID, subsequent columns is phenotype

samples pheno_name
indv1 value1
indv2 value2

Supported genotype formats: VCF (.vcf, .vcf.gz), PLINK binary (.bed/.bim/.fam), numpy archives (.npz/.snp/.idv)

Python Version

Requires Python 3.8+

Test Data

Example data in example/ directory from Parker et al, Nature Genetics, 2016 (via GEMMA project)

About

JanusX is a high-performance, ALL-in-ONE suite for quantitative genetics that unifies genome-wide association studies (GWAS) and genomic selection (GS).

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published