ATTILA - AutomaTed Tool For Immunoglobulin Analysis

Note

Documentation / Documentação:

For the Brazilian Portuguese version of this file, see README.PTBR.md.
To access the detailed user manual, see MANUAL.md.

Original Publication

This tool is an updated version of ATTILA, originally published in:

Discovering Selected Antibodies From Deep-Sequenced Phage-Display Antibody Library Using ATTILA

Andréa Queiroz Maranhão, Heidi Muniz Silva, Waldeyr Mendes Cordeiro da Silva, Renato Kaylan Alves França, Thais Canassa De Leo, Marcelo Dias-Baruffi, Rafael Trindade Burtet, Marcelo Macedo Brigido

Bioinformatics and Biology Insights, 2020. DOI: 10.1177/1177932220915240

Abstract

Phage display is a powerful technique to select high-affinity antibodies for different purposes, including biopharmaceuticals. Next-generation sequencing (NGS) presented itself as a robust solution, making it possible to assess billions of sequences of the variable domains from selected sublibraries. Handling this process, a central difficulty is to find the selected clones. Here, we present the AutomaTed Tool For Immunoglobulin Analysis (ATTILA), a new tool to analyze and find the enriched variable domains throughout a biopanning experiment. The ATTILA is a workflow that combines publicly available tools and in-house programs and scripts to find the fold-change frequency of deeply sequenced amplicons generated from selected VH and VL domains. We analyzed the same human Fab library NGS data using ATTILA in 5 different experiments, as well as on 2 biopanning experiments regarding performance, accuracy, and output. These analyses proved to be suitable to assess library variability and to list the more enriched variable domains, as ATTILA provides a report with the amino acid sequence of each identified domain, along with its complementarity-determining regions (CDRs), germline classification, and fold change. Finally, the methods employed here demonstrated a suitable manner to combine amplicon generation and NGS data analysis to discover new monoclonal antibodies (mAbs).

Project Summary

ATTILA (AutomaTed Tool For Immunoglobulin Analysis) is a bioinformatics pipeline designed to search and select candidate clones of immunoglobulins (VH and VL) from libraries generated by Phage Display experiments. Originally built with a combination of Perl, R statistical scripts, and proprietary C binaries, the system has been entirely rewritten and consolidated into native Python 3 to guarantee full cross-platform portability (Windows, Linux, macOS), robust step execution, and independence from unnecessary additional interpreters.

List of Features

Paired-End Reads Assembly (Join): Merging of forward and reverse sequences into a single contiguous fragment via fastq-join.
Filtering and Quality Control (Filter): Automatic removal of short or low-quality reads via prinseq-lite and generation of visual statistics via fastqc.
6-Frame Local Translation and ORF Detection (Translate): Native Python translation of nucleotide sequences to amino acids in the 6 possible reading frames, filtering by valid immunoglobulin ORFs containing conserved Cysteines and the FR4 motif (WG.G for VH, FG.G for VL).
Relative Frequency and Enrichment Calculation (Frequency): Abundance counting of selected CDR3s and automatic Fold Change calculation of enrichment between the initial round (R0) and final round (RN).
Aligned Residue Numbering (Number): Automated access to the Kabat numbering scheme of antibodies via the UCL Abnum API.
Nucleotide Sequence Recovery (NT-Recovery): Retrieval of original nucleotide sequences corresponding to the selected clones by mapping alignment coordinates.
Statistical Proportion Tests (Stats): Native Python statistical calculation of the single-tailed Z-test of proportion differences and 95% confidence intervals for enrichment, applying Bonferroni correction.
Germline Classification (Germline): Alignment and germline gene assignment using local igblastp.
Consolidated Visual Report (Report): Compilation of complete CDR/FR region tables for candidate clones, read loss data, and charts into a standalone interactive HTML report.
Pipeline Verification and Demo (--example): Built-in synthetic dataset generation and pipeline validation that runs all compatible steps based on the host environment dependencies.

Architecture

Database

The pipeline operates utilizing the filesystem itself in structured formats (FASTQ, FASTA, CSV, TXT) for storing raw reads, intermediate results, and metrics. The data representation below details this ecosystem.

Database Diagram

erDiagram
    FASTQ_Reads ||--o{ FASTA_Proteins : "translated to"
    FASTA_Proteins ||--o{ FASTA_Alignment : "Kabat aligned"
    FASTA_Alignment ||--o{ TXT_Statistics : "statistically tested"
    CSV_Counting }o--|| FASTQ_Reads : "quantifies reads in"

Data Dictionary

Table: FASTQ_Reads (Raw and filtered .fq/.fastq files)

Field	Type	Description
id	TEXT	Unique read identifier generated by the sequencer (header).
seq	TEXT	DNA nucleotide sequence (A, C, T, G, N bases).
qual	TEXT	Phred quality ASCII string corresponding to each nucleotide.

Table: FASTA_Proteins (Translated aa.fasta / nt.fasta files)

Field	Type	Description
id	TEXT	Read identifier associated with the frame (e.g., seq_id\|FRAME:1+).
seq	TEXT	Full sequence of the translated variable domain of amino acids.
cdr3_seq	TEXT	Isolated sequence of the identified CDR3 loop.

Table: CSV_Counting (vhSequenceCounting.csv / vlSequenceCounting.csv)

Field	Type	Description
library	TEXT	Name of the corresponding library (R0, RN, or Selected).
reads	INTEGER	Number of reads/sequences remaining after processing.
step	TEXT	Corresponding pipeline step (raw, joining, filtering, translation, frequency, enrichment, numeration).

Table: TXT_Statistics (vhoutputRstats.txt / vloutputRstats.txt)

Field	Type	Description
id	TEXT	Candidate clone identifier.
pvalue	REAL	P-value of the single-tailed Z-test of proportion enrichment.
infIC	REAL	Lower bound of the 95% confidence interval of the difference.
supIC	REAL	Upper bound of the 95% confidence interval of the difference.

Components

Component Diagram

graph TD
    run[run.sh: Bash Wrapper] -->|calls| attila[attila.py: Main Orchestrator]
    attila -->|1. Join| fjoin[fastq-join: PE reads merger]
    attila -->|2. Filter| fqc[fastqc: Quality control]
    attila -->|2. Filter| prinseq[prinseq-lite: Quality filtering]
    attila -->|3. Translate| trans[attila.py - translate_all: Frame & ORF translation]
    attila -->|4. Frequency| freq[attila.py - frequency_counter: Counts & Fold Change]
    attila -->|5. Number| num[attila.py - numberab: Kabat UCL numbering]
    attila -->|6. NT-Recovery| rec[attila.py - get_ntsequence: NT recovery]
    attila -->|7. Stats| stat[attila.py - calculate_z_test: Statistical Z-test]
    attila -->|8. Germline| igb[igblastp: Germline alignment]
    attila -->|9. Report| rep[html_creator.py: HTML Report Generator]

Technologies and Versions

Technology	Version	Description
Python	`3.11+`	Orchestration language and main biological processing logic.
Bash	`4.0+`	Friendly command-line wrapper script for setup and execution.
FastQC	`0.11+`	Visual quality control of biological sequences.
Prinseq-lite	`0.20+`	Quality filtering and trimming of reads.
Fastq-join	`1.01+`	Assembly/merging of paired-end forward and reverse reads.
IgBlast	`1.14.0+`	Alignment against human/mouse germline databases.
Matplotlib	`3.9+`	Native plotting of statistical charts in report (Optional - fallback to R).
NumPy	`2.0+`	Numerical operations required by Matplotlib.
R / Rscript	`4.0+`	Alternative interpreter for statistical charts (Optional - ggplot2, scales).
Bootstrap	`5.3.0`	CSS framework used for the modern design and responsiveness of the HTML Report.
Bootstrap Icons	`1.10.5`	Vector icon library used for collapsible panels in the report.
Google Fonts (Inter)	`N/A`	Modern font family used to enhance scientific report readability.

Features

Requirements

Feature: Run Pipeline with Configuration

Feature	Form Field / Argument	Database Field / Configuration	Applied Rules
Orchestration	`--config my_project.cfg`	Input `.cfg` file containing paths.	The configuration file must exist and contain valid paths within the filesystem.

Feature: Modular Control by Steps

Feature	Form Field / Argument	Database Field / Configuration	Applied Rules
Modular Execution	`--steps join,filter,translate`	`ATTILA_STEPS` env	The pipeline will run only the specified comma-separated sub-steps.

Feature: Choose VH/VL Libraries

Feature	Form Field / Argument	Database Field / Configuration	Applied Rules
Chain Definition	`--type vh`, `--type vl` or `--type both`	`libtype`	Allows running VH (0), VL (1) or both configurations sequentially and automatically.

Feature: Run Demonstration Example

Feature	Form Field / Argument	Database Field / Configuration	Applied Rules
Demo Run	`--example`	Environment detection	Runs VH and VL pipelines sequentially on synthetic data. Automatically falls back to Python-only steps if CLI bioinformatics dependencies are missing.

Installation and Usage

Option 1: Local Installation

Prerequisites

Python 3.8+
CLI Dependencies: FastQC, Prinseq-lite, Fastq-join, and IgBlast (optional).

Automated Installation (Recommended)

ATTILA includes a cross-platform installer that detects your OS and installs all dependencies automatically:

git clone https://github.com/waldeyr/attila.git
cd attila
chmod +x install.sh run.sh
./install.sh

Supported platforms:

Platform	Package manager	Notes
macOS	Homebrew (`brew`)	`fastq-join` compiled from source (ea-utils)
Debian / Ubuntu	`apt`	All tools available as packages
RHEL / Fedora / CentOS / Rocky / AlmaLinux	`dnf` / `yum`	Enables EPEL automatically

IgBlast (germline step) is optional. The installer downloads and installs it automatically, but the pipeline runs without it — the germline classification step is simply skipped.

Manual Installation Steps

If you prefer to install dependencies yourself:

Clone the ATTILA repository:

git clone https://github.com/waldeyr/attila.git
cd attila

Install Python dependencies:
```
pip install -r requirements.txt
```
Install CLI tools:
- macOS: brew install fastqc (fastq-join must be compiled from source — see install.sh)
- Debian/Ubuntu: sudo apt install fastqc ea-utils perl
- RHEL/Fedora/Rocky: sudo dnf install epel-release && sudo dnf install fastqc ea-utils perl
- prinseq-lite (all platforms): download the Perl script from SourceForge and place it in your PATH.
Set execution permissions:
```
chmod +x run.sh
```

Execution

Run the full pipeline using a configuration file:

./run.sh --config my_project_VH.cfg --all

To run only specific steps:

./run.sh --config my_project_VH.cfg --steps filter,translate

To start the interactive configuration wizard:

./run.sh --interactive

To run the pipeline with synthetic demo data to verify the installation:

./run.sh --example

Note: If fastqc or prinseq-lite is missing from your PATH, the wrapper will automatically switch to a fallback mode, copying pre-filtered synthetic FASTA files and running all Python-only processing stages. This ensures you can verify and explore the pipeline output without installing additional command-line tools. Results are saved to example_output/example_project_results/.

Testing & Verification

run.sh automatically runs installation pre-flight checks before every execution. To run the checks manually:

python3 programs/test_attila.py TestInstallation -v

To run the full test suite (unit + integration):

python3 programs/test_attila.py -v

Test class	What it verifies
`TestInstallation`	Python version, pip packages, CLI tools in PATH, required data files
`TestAttilaPipeline`	DNA functions, 6-frame translation, Z-test statistics, full VH+VL pipeline end-to-end

Option 2: Running with Docker (Recommended for reproducible environments)

Docker containerizes all bioinformatics dependencies (FastQC, prinseq-lite, fastq-join, IgBlast, and Python requirements), removing the need for manual local installation. The image is based on python:3.13-slim-bookworm and the build validates all tools automatically via TestInstallation.

Building the Docker Image

From the repository root directory, run:

docker build -t attila:latest .

The build runs python3 programs/test_attila.py TestInstallation -v as a final step. If any required tool is missing from the image, the build fails with a clear error.

Running with Docker

Mount your working directory with -v so that input files and results are accessible on your host machine.

A. Verify installation with synthetic demo data:

docker run --rm -v "$(pwd):/app/shared" attila:latest ./run.sh --example

Results are saved inside the container at /app/example_output/ and mirrored to $(pwd)/example_output/ on your host.

B. Run the interactive configuration wizard:

docker run -it --rm -v "$(pwd):/app/shared" attila:latest ./run.sh --interactive

Save your project inside /app/shared so that results persist on your host.

C. Execute a configuration file:

docker run --rm -v "$(pwd):/app/shared" attila:latest \
    ./run.sh --config /app/shared/my_project_VH.cfg --type both

D. Run the full test suite inside the container:

docker run --rm attila:latest python3 programs/test_attila.py -v

E. Open an interactive shell inside the container:

docker run -it --rm -v "$(pwd):/app/shared" attila:latest bash

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
img		img
programs		programs
.DS_Store		.DS_Store
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
MANUAL.md		MANUAL.md
README.PTBR.md		README.PTBR.md
README.md		README.md
_config.yml		_config.yml
install.sh		install.sh
requirements.txt		requirements.txt
run.sh		run.sh
temp.ttx		temp.ttx

Folders and files

Latest commit

History

Repository files navigation

ATTILA - AutomaTed Tool For Immunoglobulin Analysis

Original Publication

Abstract

Project Summary

List of Features

Architecture

Database

Database Diagram

Data Dictionary

Table: FASTQ_Reads (Raw and filtered .fq/.fastq files)

Table: FASTA_Proteins (Translated aa.fasta / nt.fasta files)

Table: CSV_Counting (vhSequenceCounting.csv / vlSequenceCounting.csv)

Table: TXT_Statistics (vhoutputRstats.txt / vloutputRstats.txt)

Components

Component Diagram

Technologies and Versions

Features

Requirements

Feature: Run Pipeline with Configuration

Feature: Modular Control by Steps

Feature: Choose VH/VL Libraries

Feature: Run Demonstration Example

Installation and Usage

Option 1: Local Installation

Prerequisites

Automated Installation (Recommended)

Manual Installation Steps

Execution

Testing & Verification

Option 2: Running with Docker (Recommended for reproducible environments)

Building the Docker Image

Running with Docker

A. Verify installation with synthetic demo data:

B. Run the interactive configuration wizard:

C. Execute a configuration file:

D. Run the full test suite inside the container:

E. Open an interactive shell inside the container:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages