A comprehensive collection of standardized enumerations and value sets for data science, bioinformatics, materials science, and beyond.
Data standardization is hard. Every project reinvents the wheel with custom enums, inconsistent naming, and no semantic meaning.
Common Value Sets solves this by providing:
- π Rich, standardized enumerations β Pre-defined value sets across multiple domains
- 𧬠Semantic meaning β Every value is linked to ontology terms (when possible)
- π Python-first convenience β Work with simple enums, get semantics for free
- π Multi-language support β Generate JSON Schema, TypeScript, and more
- π Interoperability β Built on LinkML standards for maximum compatibility
Different datasets often represent the same concept in incompatible ways:
M
/F
male
/female
1
/2
They all mean the same thing, but they donβt interoperate.
With Common Value Sets, you can instead use a shared enum:
from valuesets.enums.core import SexEnum
s = SexEnum.MALE
print(s.value) # "MALE"
print(s.get_meaning()) # "NCIT:C20197"
print(s.get_description())# "Male sex"
from valuesets.enums.bio.structural_biology import StructuralBiologyTechnique
from valuesets.enums.spatial.spatial_qualifiers import AnatomicalSide
# Rich enums with metadata and ontology mappings
technique = StructuralBiologyTechnique.CRYO_EM
print(technique.value) # "CRYO_EM"
print(technique.get_description()) # "Cryo-electron microscopy"
print(technique.get_meaning()) # "CHMO:0002413" (Chemical Methods Ontology)
print(technique.get_annotations()) # {'resolution_range': '2-30 Γ
typical', ...}
# Spatial relationships with BSPO mappings
side = AnatomicalSide.LEFT
print(side.get_meaning()) # "BSPO:0000000" (Biological Spatial Ontology)
# Look up enums by their ontology terms
found = AnatomicalSide.from_meaning("BSPO:0000000") # Returns LEFT
from valuesets.enums.statistics import StatisticalTest, PValueThreshold
from valuesets.enums.data_science import DatasetSplitType, ModelType
# Standardized statistical tests with STATO ontology mappings
test = StatisticalTest.STUDENTS_T_TEST
print(test.get_meaning()) # "STATO:0000176"
print(test.get_description()) # "Student's t-test for comparing means"
# ML pipeline with standard splits
split = DatasetSplitType.TRAIN
model = ModelType.RANDOM_FOREST
# P-value thresholds with clear semantics
threshold = PValueThreshold.SIGNIFICANT
print(threshold.get_annotations()) # {'value': 0.05, 'symbol': '*'}
from valuesets.enums.bio.taxonomy import CommonOrganismTaxaEnum, BiologicalKingdom
from valuesets.enums.bio.cell_biology import CellCyclePhase, CellType
# Model organisms with NCBI Taxonomy IDs
human = CommonOrganismTaxaEnum.HUMAN
print(human.get_meaning()) # "NCBITaxon:9606"
print(human.get_description()) # "Homo sapiens (human)"
# Cell biology with CL and GO mappings
phase = CellCyclePhase.S_PHASE
print(phase.get_meaning()) # "GO:0000084"
neuron = CellType.NEURON
print(neuron.get_meaning()) # "CL:0000540"
# Get all organisms at a specific taxonomic level
mammals = [org for org in CommonOrganismTaxaEnum
if 'MAMMALIA' in str(org)]
- 𧬠Biology:
- Structural Biology: Cryo-EM techniques, crystallization methods, detectors
- Cell Biology: Cell types, cell cycle phases, organelles
- Taxonomy: Model organisms (all with NCBI Taxonomy IDs)
- π Spatial: Anatomical directions, planes, relationships (BSPO mapped)
- π Statistics: Statistical tests (STATO mapped), p-value thresholds
- π§ͺ Data Science: ML model types, dataset splits, metrics
- βοΈ Materials Science: Crystal structures, characterization methods
- π₯ Clinical/Medical: Blood types (SNOMED), vital status
- π Environmental: Exposure routes, pollutants
- β‘ Energy: Sources, storage methods, efficiency ratings
- π§ Geography: Country codes (ISO), time zones, coordinate systems
- β° Time: Temporal relationships, periods, frequencies
- πΌ Academic: Publication types, research roles, funding sources
- π Industrial: Manufacturing processes, quality standards
Use the raw LinkML schemas for data modeling, validation, and documentation:
# Direct schema usage
Person:
attributes:
vital_status:
range: VitalStatusEnum # ALIVE, DECEASED, UNKNOWN
Get Python enums with full IDE support, type checking, and semantic metadata:
# Type-safe enums with ontology mappings
status = VitalStatusEnum.ALIVE
print(status.meaning) # "NCIT:C37987"
Write simple code, get semantic meaning automatically:
# Example: Different systems use different names for the same concept
from valuesets.enums.medical import BloodTypeEnum
from external_system import PatientBloodType # Third-party enum
# Even though the enum values might be named differently:
# BloodTypeEnum.A_POSITIVE vs PatientBloodType.A_POS
# They map to the same SNOMED code: SNOMED:278149003
if blood_type.get_meaning() == patient_blood.get_meaning():
# Semantic interoperability - works across different naming conventions
process_compatible_blood_type()
# Or use the utility function
if same_meaning_as(blood_type, patient_blood):
process_compatible_blood_type()
Generate schemas and types for any language:
# Generate JSON Schema for web apps
linkml-convert schema.yaml -t json-schema
# Generate TypeScript definitions
linkml-convert schema.yaml -t typescript
# Generate SQL DDL
linkml-convert schema.yaml -t sql
- Excel/Google Sheets: Generate dropdown validation lists
- Web forms: Auto-generate select options with descriptions
- APIs: Standardized response codes and classifications
- Databases: Consistent foreign key constraints
# Some enums support hierarchical is_a relationships
from valuesets.enums import ViralGenomeTypeEnum
# Baltimore classification with hierarchy
positive_rna = ViralGenomeTypeEnum.SSRNA_POSITIVE # Group IV
# inherits from SSRNA (single-stranded RNA)
from valuesets.enums.bio.structural_biology import CryoEMGridType
grid = CryoEMGridType.QUANTIFOIL
metadata = grid.get_metadata()
print(metadata)
# {
# 'name': 'QUANTIFOIL',
# 'value': 'QUANTIFOIL',
# 'description': 'Quantifoil holey carbon grid',
# 'annotations': {
# 'hole_sizes': '1.2/1.3, 2/1, 2/2 ΞΌm common',
# 'manufacturer': 'Quantifoil'
# }
# }
# Get all grid types with their descriptions at once
all_grids = CryoEMGridType.get_all_descriptions()
# {'C_FLAT': 'C-flat holey carbon grid', 'QUANTIFOIL': ...}
from valuesets.enums.spatial import AnatomicalPlane
# Get all ontology mappings for an enum
mappings = AnatomicalPlane.get_all_meanings()
print(mappings)
# {'SAGITTAL': 'BSPO:0000417', 'CORONAL': 'BSPO:0000019', ...}
# List all metadata for every value in an enum
all_metadata = AnatomicalPlane.list_metadata()
for name, meta in all_metadata.items():
print(f"{name}: {meta.get('description', 'No description')}")
# Find enum by ontology term (useful for data integration)
plane = AnatomicalPlane.from_meaning("BSPO:0000417") # Returns SAGITTAL
Some enums in this collection are dynamic enums that can be expanded at runtime by querying ontologies. This uses LinkML's Dynamic Enum feature.
# Example: A dynamic enum that pulls values from an ontology
CellTypeEnum:
permissible_values:
NEURON:
meaning: CL:0000540
ASTROCYTE:
meaning: CL:0002585
# Dynamic expansion from Cell Ontology
reachable_from:
source_ontology: obo:cl
source_nodes:
- CL:0000540 # neuron
include_self: false
relationship_types:
- rdfs:subClassOf
Note: Runtime expansion support is coming soon! Currently, dynamic enums provide:
- β Static values with ontology mappings
- β Metadata and descriptions
- π§ Runtime expansion from ontologies (coming in next release)
When runtime expansion is available, you'll be able to:
# Future: Dynamically expand enum with all neuron subtypes
cell_types = CellTypeEnum.expand_from_ontology()
# Would add: MOTOR_NEURON, SENSORY_NEURON, INTERNEURON, etc.
Full Documentation Website β
TODO: The OWL artifact generated from these value sets will be available soon on:
We plan to add maturity level metadata to each enum to help users understand their readiness:
- π’ Stable: Production-ready, well-tested, unlikely to change
- π‘ Beta: Usable but may have minor changes
- π΄ Draft: Under development, expect changes
# Future: Check maturity before use
if enum_def.maturity_level == MaturityLevel.STABLE:
use_in_production()
Split the package into domain-specific modules for lighter installs:
# Future: Install only what you need
pip install valuesets-core # Core functionality
pip install valuesets-bio # Biological domains
pip install valuesets-materials # Materials science
pip install valuesets-clinical # Clinical/medical
- Domain Packages: Community-maintained domain-specific value sets
- Organization Standards: Company/institution-specific enums that extend base sets
- Mapping Tables: Cross-ontology and cross-standard mappings
- π€ AI/LLM Integration: Semantic annotations optimized for language models
- π Usage Analytics: Track which enums are most used, identify gaps
- π Version Management: Handle enum evolution with deprecation warnings
- π Multi-ontology Support: Map single values to multiple ontologies
- π Fuzzy Matching: Find enums by approximate string matching
git clone https://github.com/linkml/valuesets
cd valuesets
uv install
just --list # Show all available commands
just test # Run tests
just doctest # Run doctests
just lint # Run linting
just site # Build documentation site
We welcome contributions! Whether you're adding new domains, improving existing enums, or fixing bugs:
- Domain Experts: Contribute standardized value sets for your field
- Developers: Add utility functions, improve tooling, fix issues
- Users: Report missing enums, suggest improvements, share use cases
βββ src/valuesets/
β βββ schema/ # π LinkML YAML schemas (source of truth)
β β βββ bio/ # Biological domains
β β β βββ cell_biology.yaml
β β β βββ structural_biology.yaml
β β β βββ taxonomy.yaml
β β βββ spatial/ # Spatial and anatomical
β β β βββ spatial_qualifiers.yaml
β β βββ statistics.yaml
β β βββ core.yaml
β βββ enums/ # π Generated Python enums
β β βββ <auto-generated from schemas>
β βββ generators/ # π§ Rich enum generator
β β βββ rich_enum.py
β βββ validators/ # β Ontology validation
β βββ enum_evaluator.py
βββ docs/ # π Documentation
βββ tests/ # π§ͺ Test cases
βββ test_rich_enums.py # Rich enum functionality
βββ validators/ # Ontology validation tests
Built with LinkML and the linkml-project-copier template.
Making data standardization simple, semantic, and scalable π