(Formerly named as ASTRA in the sourse code)
๐ก Naming Convention Note In case of ambiguity ๐ PolyGraph is the name of our method as presented in the academic paper (previously named ASTRA in the code).
๐ PolyGen is our data generation engine (referenced in the codebase as ASTRA-Gen 3.0).
โ ๏ธ Please note: While the conceptual descriptions below use the new terminology (PolyGraph/PolyGen), the source code, file structures, and command lines retain the originalastranamespace to ensure reproducibility.
PolyGraph is a cutting-edge, end-to-end framework for multi-agent system fault attribution ๐ต๏ธโโ๏ธ. It seamlessly combines Graph Neural Networks (GNN) ๐ธ๏ธ and Large Language Models (LLM) ๐ค to achieve surgical precision in fault localization within complex multi-agent environments.
The system employs a smart Coarse-to-Fine ๐ two-stage strategy:
- ๐ Stage 1: Using GNN to rapidly identify top-K candidate agents.
- ๐ฌ Stage 2: Using LLM for fine-grained, reasoning-based analysis.
| Feature | Description |
|---|---|
| ๐ฒ Dynamic Causal Simulation | PolyGen (ASTRA-Gen 3.0) generates highly realistic multi-agent interaction traces. |
| ๐ธ๏ธ DHCG | Dynamic Heterogeneous Causal Graph captures intricate temporal โณ and causal ๐ relationships. |
| ๐ง PolyGraph-MoE Model | STGAT-based GNN equipped with a Mixture of Experts for robust coarse-grained fault attribution. |
| ๐ง LLM Fine-tuning | Specialized Qwen 8B model fine-tuned for deep-dive fault analysis. |
| ๐ฏ Coarse-to-Fine Eval | A sophisticated two-stage evaluation system delivering high accuracy. |
The codebase retains the astra package structure as follows:
ASTRA_Release/
โโโ astra/ # ๐ฆ Main source code package
โ โโโ generation/ # ๐ญ Stage 1: Data generation
โ โ โโโ generator.py # PolyGen (ASTRA-Gen 3.0) generator
โ โโโ parsing/ # ๐งฉ Stage 1: Graph parsing
โ โ โโโ dhcg_parser/ # DHCG parser implementation
โ โโโ data/ # ๐ Stage 2: Data adapter
โ โ โโโ adapter.py # GraphDataConverter
โ โ โโโ graph_data.py # HeteroGraph data structure
โ โโโ model/ # ๐ง Stage 3: Model architecture
โ โ โโโ gnn.py # PolyGraph-MoE (ASTRA-MoE) model
โ โ โโโ stgat.py # STGAT implementation
โ โ โโโ loss.py # Loss functions
โ โโโ training/ # ๐๏ธ Stage 3 & 4: Training scripts
โ โ โโโ train_gnn.py # GNN training script
โ โ โโโ prep_llm_data.py # LLM data preparation
โ โโโ evaluation/ # ๐ Stage 5: Evaluation
โ โโโ eval_pipeline.py # Coarse-to-fine evaluation
โ โโโ eval_benchmark.py # Benchmark evaluation
โโโ scripts/ # ๐ ๏ธ Utility scripts
โ โโโ parse_dataset.py # Dataset parsing
โ โโโ preprocess_external.py # External dataset preprocessing
โโโ examples/ # ๐ Sample data
โ โโโ golden_sample.json # โ
Golden trace example
โ โโโ fatal_sample.json # โ Fatal trace example
โ โโโ healed_sample.json # ๐ Healed trace example
โโโ requirements.txt # ๐ Python dependencies
โโโ README.md # ๐ This file
- ๐ Python >= 3.8
- ๐ฎ CUDA >= 11.8 (for GPU acceleration)
- ๐ง 16GB+ RAM recommended
- ๐พ 20GB+ disk space for models and data
-
Clone the repository:
git clone <repository-url> cd ASTRA_Release
-
Install dependencies:
pip install -r requirements.txt
-
Download pre-trained models (optional):
- ๐ฅ Qwen 8B base model: Download from HuggingFace or ModelScope.
- ๐ Place models in the
models/directory.
Run the PolyGen (ASTRA-Gen 3.0) engine to create synthetic tasks:
python -m astra.generation.generator \
--num_tasks 100 \
--output_dir outputs/astra_v3 \
--api_base <your-llm-api-endpoint>Convert the raw logs into graph structures:
python scripts/parse_dataset.py \
--input_dir outputs/astra_v3 \
--output_dir processed_graphs/astra_v3The graph data is automatically converted during the training phase. ๐งโโ๏ธ
The GraphDataConverter handles:
- ๐น Node feature extraction and encoding
- ๐น Edge feature extraction
- ๐น HeteroGraph sequence construction
Train the coarse-grained expert model:
python -m astra.training.train_gnn \
--data_dir processed_graphs/astra_v3 \
--output_dir checkpoints/astra_moe \
--epochs 50 \
--batch_size 8 \
--device cudaFilter data using the GNN checkpoint to create focused samples for the LLM:
python -m astra.training.prep_llm_data \
--graph_dir processed_graphs/astra_v3 \
--gnn_checkpoint checkpoints/astra_moe/best_model.pt \
--output_dir training_data/llm \
--top_k 4Use your preferred fine-tuning framework (e.g., PEFT) to train the PolyGraph reasoning module:
# Use your preferred LLM fine-tuning framework
# Example with PEFT:
python -m astra.training.finetune_llm \
--base_model Qwen/Qwen3-8B \
--data_dir training_data/llm \
--output_dir adapters/qwen8b_astraRun the full PolyGraph pipeline:
python -m astra.evaluation.eval_pipeline \
--test_data_dir processed_graphs/test \
--gnn_checkpoint checkpoints/astra_moe/best_model.pt \
--llm_adapter adapters/qwen8b_astra \
--base_model_name Qwen/Qwen3-8B \
--top_k 4 \
--device cudapython -m astra.evaluation.eval_benchmark \
--test_data_dir processed_graphs/tracertraj \
--gnn_checkpoint checkpoints/astra_moe/best_model.pt \
--llm_adapter adapters/qwen8b_astra \
--base_model_name Qwen/Qwen3-8B \
--device cudaThe examples/ directory contains sample data files for quick testing:
- โ
sample_golden.json: A successful multi-agent interaction trace (no fault). - โ
sample_fatal.json: A trace with injected fault.
You can use these to test the parsing and evaluation pipeline. See examples/README.md for more details.
The core GNN architecture consists of:
- MicroStateEncoder: ๐ท Multi-modal node feature encoder.
- STGAT: ๐ธ๏ธ Spatio-temporal graph attention network.
- TemporalReasoning: โณ Causal temporal reasoning with RoPE.
- MoEHead: ๐ฆ Uncertainty-aware expert routing.
The parser extracts the Dynamic Heterogeneous Causal Graph:
- Nodes ๐ฃ: Agents, Tools, Artifacts, Environment.
- Edges โ: Invoke, Return, Reference, Communicate, Affect.
- Features ๐: Text embeddings, metadata features.
- Coarse Stage (GNN) โก: Predicts top-K candidate agents.
- Fine Stage (LLM) ๐: Analyzes candidate logs to identify exact fault agent and step.
PolyGraph demonstrates state-of-the-art results:
- ๐ Agent Accuracy: ~67.39% on Who&When benchmark, and 77.95% on TracerTraj-Code.
- ๐ฏ Step Accuracy: ~40.22% on Who&When benchmark, and 31.50% on TracerTraj-Code.
- ๐ Token Efficiency: Optimized prompt design significantly reduces LLM token usage.
MIT License - See LICENSE file for details.