Working with Open Targets evidence data 🧐
Where do evidence strings come from?
Source
Source file
Generated by
Count
Sample
phewas_catalog
phewas_catalog-2018-11-28.json.gz
phewascat
56,014
crispr
crispr-2019-04-08.json.gz
CRISPR.py
1,641
gene2phenotype
gene2phenotype-2019-05-30.json.gz
Gene2Phenotype.py
1,589
genomics_england
genomics_england-2018-10-02.json.gz
GenomicsEnglandPanelApp.py
10,533
intogen
intogen-2018-07-23.json.gz
IntOGen.py
2,371
phenodigm
phenodigm-2019-05-31.json.gz
MouseModels.py
500,462
animal_model.json
progeny
progeny-2018-07-23.json.gz
PROGENY.py
308
slapenrich
slapenrich-2018-11-29.json.gz
SLAPEnrich.py
74,575
sysbio
sysbio-2019-01-31.json.gz
SystemsBiology.py
408
expression_atlas
atlas-2019-06-04.json.gz
external
381,141
rna_expression.json
chembl
chembl-2019-03-25.json.gz
external
384,783
known_drug.json
cancer_gene_census
cosmic-2019-05-15.json.gz
external
59,992
somatic_mutation.json
europepmc
epmc-2019-05-24.json.gz
external
5,438,280
literature.json
eva; eva_somatic
eva-2019-06-03.json.gz
external
89,636; 7,057
gwas_catalog
gwas-2019-05-29.json.gz
external
157,008
genetic_association.json
reactome
reactome-2019-05-29.json.gz
external
10,083
affected_pathway.json
uniprot; uniprot_literature; uniprot_somatic
uniprot-2019-05-20.json.gz
external https://github.com/ebi-uniprot/open-targets-core-db
28,743; 4,567; 284
Key
Required?
Details
sourceID
Y
access_level
Y
type
N
validated_against_schema_version
N
target
Y
required: id, target_type, activity; optional: activity, complex_id, target_name, target_class
disease
Y
required: id
unique_association_fields
Y
Composite key to hash for duplicate detection
evidence
Y
Where most of the variability lives
Unique association fields
sample_size
gwas_panel_resolution
pubmed_refs
target
object
variant
study_name
pvalue
confidence_interval
odd_ratio
Additional top-level keys
literature.references.[]
variant.{type, id}
variant2disease
gwas_sample_size
unique_experiment_reference
gwas_panel_resolution
provenance_type
is_associated
resource_score
evidence_codes
date_asserted
gene2variant
functional_consequence
provenance_type
is_associated
evidence_codes
date_asserted