BioCypher Open Targets Data (25.12) Adapter

This repository contains a BioCypher adapter for Open Targets data version 25.12. The project is currently under active development.

AI Usage Disclaimer: This project makes extensive use of AI. The author guarantees that every effort has been made to oversee the architecture and coding style and review the generated content for quality and consistency.

Overview

BioCypher's modular design enables the use of different adapters to consume various data sources and produce knowledge graphs. This adapter serves as a "secondary adapter" for Open Targets data, meaning it adapts a pre-harmonised composite of atomic resources via the Open Targets pipeline.

The adapter includes a comprehensive reference knowledge graph with predefined sets of node types (entities) and edge types (relationships), or in the language of this adapter, presets of node and edge definitions. A script is provided to run BioCypher with the adapter, creating a knowledge graph with all predefined nodes and edges. On a consumer laptop, building the full graph typically takes 1-2 hours.

Key Features:

Includes a comprehensive knowledge graph definition designed to cover all data provided by the Open Targets Platform. See Reference Knowledge Graph for details.
Declarative syntax for graph schema construction
Powered by duckdb for fast and memory-efficient processing
True streaming from datasets to BioCypher with minimal intermediate memory usage
Type-safe schema representation with Python classes

Prerequisites

uv for dependency management

Installation

Clone the repository:

git clone https://github.com/biocypher/open-targets.git
cd open-targets

Install dependencies:
```
uv sync
```

Activate the virtual environment (optional):

source .venv/bin/activate  # On Unix/macOS
.venv\Scripts\activate    # On Windows

Or run commands directly with uv run:

uv run python <script>

Usage

Runnable examples are provided in the example/ directory. Each example includes:

A Python script demonstrating usage
Configuration files
Data preparation instructions in datasets/README.md

Important

When running the example scripts, ensure your current working directory is the project root.

Quick Start

Follow the Installation steps
Navigate to an example directory:
```
cd example/full_graph
```
Follow the data preparation instructions in datasets/README.md

Important: Ensure your current working directory is the project root when running example scripts:

cd /path/to/open-targets  # Navigate to project root
uv run python example/full_graph/full_graph.py

Available Examples

Full Graph (example/full_graph/): Builds the complete reference knowledge graph using all predefined definitions
Custom Subset (example/custom_subset/): Demonstrates selecting specific node/edge definitions

See example/README.md for details on all available examples and how to create your own.

Reference Knowledge Graph

The reference knowledge graph includes 40+ node types and 50+ edge types. Each definition file contains detailed docstrings explaining what it does, what data it uses, and how it works.

Finding Definition Files

Node definitions: open_targets/definition/reference_kg/node/
Edge definitions: open_targets/definition/reference_kg/edge/
Complete list: open_targets/definition/reference_kg/kg.py

Reading Docstrings

Each definition file (e.g., node_target.py, edge_molecule_has_adverse_reaction_adverse_reaction.py) starts with a module-level docstring that describes:

What entities/relationships the definition creates
Which Open Targets datasets it uses
How the data is transformed
What properties are included

Example:

# In open_targets/definition/reference_kg/node/node_target.py
"""Summary: Ensembl target gene nodes (symbol/name/biotype/functions).

Definition for TARGET nodes: scans the Targets parquet to emit Ensembl gene
targets with symbol, name, biotype, and function descriptions as the core
target entities used across drug, association, and annotation edges in the KG.
"""

To explore available definitions:

Browse the files in open_targets/definition/reference_kg/node/ and edge/
Read the docstring at the top of each file
Check kg.py to see how definitions are organized

For details on creating custom definitions, see the adapter layer documentation in open_targets/adapter/ and examine existing definitions as examples.

Open Targets Data Schema

The full schema of Open Targets data is represented as Python classes in open_targets/data/schema.py. This provides type checking for dataset and field references.

Naming Conventions:

Dataset classes: Dataset prefix (e.g., DatasetTargets, DatasetDiseases)
Field classes: Field prefix (e.g., FieldTargetsId, FieldTargetsApprovedSymbol)
Field names follow their structural location in datasets

The schema is generated from Open Targets metadata using code generation (see below). Schema classes are used throughout node/edge definitions for type-safe data access.

Code Generation

The Open Targets data schema is generated using Jinja templates.

Templates: open_targets/*.jinja
Generated code: open_targets/data/schema.py
Generation script: code_generation/generate.py

Important: Never edit generated files directly. Always modify templates and regenerate:

python code_generation/generate.py

Contributing

Contributions are welcome! Please feel free to submit a Pull Request or create an Issue if you discover any problems.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 345 Commits
code_generation		code_generation
example		example
open_targets		open_targets
test		test
.gitignore		.gitignore
AGENT.md		AGENT.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioCypher Open Targets Data (25.12) Adapter

Table of Contents

Overview

Prerequisites

Installation

Usage

Quick Start

Available Examples

Reference Knowledge Graph

Finding Definition Files

Reading Docstrings

Open Targets Data Schema

Code Generation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

biocypher/open-targets

Folders and files

Latest commit

History

Repository files navigation

BioCypher Open Targets Data (25.12) Adapter

Table of Contents

Overview

Prerequisites

Installation

Usage

Quick Start

Available Examples

Reference Knowledge Graph

Finding Definition Files

Reading Docstrings

Open Targets Data Schema

Code Generation

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages