Skip to content

linkml/valuesets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Copier Badge

Common Value Sets

A comprehensive collection of standardized enumerations and value sets for data science, bioinformatics, materials science, and beyond.

🎯 Why Common Value Sets?

Data standardization is hard. Every project reinvents the wheel with custom enums, inconsistent naming, and no semantic meaning.
Common Value Sets solves this by providing:

  • πŸ“š Rich, standardized enumerations – Pre-defined value sets across multiple domains
  • 🧬 Semantic meaning – Every value is linked to ontology terms (when possible)
  • 🐍 Python-first convenience – Work with simple enums, get semantics for free
  • 🌐 Multi-language support – Generate JSON Schema, TypeScript, and more
  • πŸ”— Interoperability – Built on LinkML standards for maximum compatibility

πŸ” A Simple Example

Different datasets often represent the same concept in incompatible ways:

  • M / F
  • male / female
  • 1 / 2

They all mean the same thing, but they don’t interoperate.
With Common Value Sets, you can instead use a shared enum:

from valuesets.enums.core import SexEnum

s = SexEnum.MALE
print(s.value)            # "MALE"
print(s.get_meaning())    # "NCIT:C20197"
print(s.get_description())# "Male sex"

⚑ Quick Start

For Python Developers

from valuesets.enums.bio.structural_biology import StructuralBiologyTechnique
from valuesets.enums.spatial.spatial_qualifiers import AnatomicalSide

# Rich enums with metadata and ontology mappings
technique = StructuralBiologyTechnique.CRYO_EM
print(technique.value)  # "CRYO_EM"
print(technique.get_description())  # "Cryo-electron microscopy"
print(technique.get_meaning())  # "CHMO:0002413" (Chemical Methods Ontology)
print(technique.get_annotations())  # {'resolution_range': '2-30 Γ… typical', ...}

# Spatial relationships with BSPO mappings
side = AnatomicalSide.LEFT
print(side.get_meaning())  # "BSPO:0000000" (Biological Spatial Ontology)

# Look up enums by their ontology terms
found = AnatomicalSide.from_meaning("BSPO:0000000")  # Returns LEFT

For Data Scientists

from valuesets.enums.statistics import StatisticalTest, PValueThreshold
from valuesets.enums.data_science import DatasetSplitType, ModelType

# Standardized statistical tests with STATO ontology mappings
test = StatisticalTest.STUDENTS_T_TEST
print(test.get_meaning())  # "STATO:0000176"
print(test.get_description())  # "Student's t-test for comparing means"

# ML pipeline with standard splits
split = DatasetSplitType.TRAIN
model = ModelType.RANDOM_FOREST

# P-value thresholds with clear semantics
threshold = PValueThreshold.SIGNIFICANT
print(threshold.get_annotations())  # {'value': 0.05, 'symbol': '*'}

For Bioinformaticians

from valuesets.enums.bio.taxonomy import CommonOrganismTaxaEnum, BiologicalKingdom
from valuesets.enums.bio.cell_biology import CellCyclePhase, CellType

# Model organisms with NCBI Taxonomy IDs
human = CommonOrganismTaxaEnum.HUMAN
print(human.get_meaning())  # "NCBITaxon:9606"
print(human.get_description())  # "Homo sapiens (human)"

# Cell biology with CL and GO mappings
phase = CellCyclePhase.S_PHASE
print(phase.get_meaning())  # "GO:0000084"

neuron = CellType.NEURON
print(neuron.get_meaning())  # "CL:0000540"

# Get all organisms at a specific taxonomic level
mammals = [org for org in CommonOrganismTaxaEnum
           if 'MAMMALIA' in str(org)]

πŸ—οΈ Available Domains

Core Domains (Most Mature)

  • 🧬 Biology:
    • Structural Biology: Cryo-EM techniques, crystallization methods, detectors
    • Cell Biology: Cell types, cell cycle phases, organelles
    • Taxonomy: Model organisms (all with NCBI Taxonomy IDs)
  • πŸ“ Spatial: Anatomical directions, planes, relationships (BSPO mapped)
  • πŸ“Š Statistics: Statistical tests (STATO mapped), p-value thresholds

Expanding Domains

  • πŸ§ͺ Data Science: ML model types, dataset splits, metrics
  • βš—οΈ Materials Science: Crystal structures, characterization methods
  • πŸ₯ Clinical/Medical: Blood types (SNOMED), vital status
  • 🌍 Environmental: Exposure routes, pollutants
  • ⚑ Energy: Sources, storage methods, efficiency ratings

Coming Soon

  • 🧭 Geography: Country codes (ISO), time zones, coordinate systems
  • ⏰ Time: Temporal relationships, periods, frequencies
  • πŸ’Ό Academic: Publication types, research roles, funding sources
  • 🏭 Industrial: Manufacturing processes, quality standards

πŸ”„ Multiple Use Cases

1. LinkML Standards (YAML schemas)

Use the raw LinkML schemas for data modeling, validation, and documentation:

# Direct schema usage
Person:
  attributes:
    vital_status:
      range: VitalStatusEnum  # ALIVE, DECEASED, UNKNOWN

2. Python Programming (Rich Enums)

Get Python enums with full IDE support, type checking, and semantic metadata:

# Type-safe enums with ontology mappings
status = VitalStatusEnum.ALIVE  
print(status.meaning)  # "NCIT:C37987"

3. "Stealth Semantics"

Write simple code, get semantic meaning automatically:

# Example: Different systems use different names for the same concept
from valuesets.enums.medical import BloodTypeEnum
from external_system import PatientBloodType  # Third-party enum

# Even though the enum values might be named differently:
# BloodTypeEnum.A_POSITIVE vs PatientBloodType.A_POS
# They map to the same SNOMED code: SNOMED:278149003

if blood_type.get_meaning() == patient_blood.get_meaning():
    # Semantic interoperability - works across different naming conventions
    process_compatible_blood_type()

# Or use the utility function
if same_meaning_as(blood_type, patient_blood):
    process_compatible_blood_type()

4. Multi-language Interoperability

Generate schemas and types for any language:

# Generate JSON Schema for web apps
linkml-convert schema.yaml -t json-schema

# Generate TypeScript definitions  
linkml-convert schema.yaml -t typescript

# Generate SQL DDL
linkml-convert schema.yaml -t sql

5. Integration & Tooling

  • Excel/Google Sheets: Generate dropdown validation lists
  • Web forms: Auto-generate select options with descriptions
  • APIs: Standardized response codes and classifications
  • Databases: Consistent foreign key constraints

πŸ› οΈ Advanced Features

Hierarchical Relationships

# Some enums support hierarchical is_a relationships
from valuesets.enums import ViralGenomeTypeEnum

# Baltimore classification with hierarchy
positive_rna = ViralGenomeTypeEnum.SSRNA_POSITIVE  # Group IV
# inherits from SSRNA (single-stranded RNA)

Rich Metadata

from valuesets.enums.bio.structural_biology import CryoEMGridType

grid = CryoEMGridType.QUANTIFOIL
metadata = grid.get_metadata()
print(metadata)
# {
#   'name': 'QUANTIFOIL',
#   'value': 'QUANTIFOIL',
#   'description': 'Quantifoil holey carbon grid',
#   'annotations': {
#     'hole_sizes': '1.2/1.3, 2/1, 2/2 ΞΌm common',
#     'manufacturer': 'Quantifoil'
#   }
# }

# Get all grid types with their descriptions at once
all_grids = CryoEMGridType.get_all_descriptions()
# {'C_FLAT': 'C-flat holey carbon grid', 'QUANTIFOIL': ...}

Utility Functions

from valuesets.enums.spatial import AnatomicalPlane

# Get all ontology mappings for an enum
mappings = AnatomicalPlane.get_all_meanings()
print(mappings)
# {'SAGITTAL': 'BSPO:0000417', 'CORONAL': 'BSPO:0000019', ...}

# List all metadata for every value in an enum
all_metadata = AnatomicalPlane.list_metadata()
for name, meta in all_metadata.items():
    print(f"{name}: {meta.get('description', 'No description')}")

# Find enum by ontology term (useful for data integration)
plane = AnatomicalPlane.from_meaning("BSPO:0000417")  # Returns SAGITTAL

Dynamic Enums

Some enums in this collection are dynamic enums that can be expanded at runtime by querying ontologies. This uses LinkML's Dynamic Enum feature.

# Example: A dynamic enum that pulls values from an ontology
CellTypeEnum:
  permissible_values:
    NEURON:
      meaning: CL:0000540
    ASTROCYTE:
      meaning: CL:0002585
  # Dynamic expansion from Cell Ontology
  reachable_from:
    source_ontology: obo:cl
    source_nodes:
      - CL:0000540  # neuron
    include_self: false
    relationship_types:
      - rdfs:subClassOf

Note: Runtime expansion support is coming soon! Currently, dynamic enums provide:

  • βœ… Static values with ontology mappings
  • βœ… Metadata and descriptions
  • 🚧 Runtime expansion from ontologies (coming in next release)

When runtime expansion is available, you'll be able to:

# Future: Dynamically expand enum with all neuron subtypes
cell_types = CellTypeEnum.expand_from_ontology()
# Would add: MOTOR_NEURON, SENSORY_NEURON, INTERNEURON, etc.

πŸ“– Documentation

Full Documentation Website β†’

OWL Ontology

TODO: The OWL artifact generated from these value sets will be available soon on:

πŸš€ Future Directions

Maturity Levels

We plan to add maturity level metadata to each enum to help users understand their readiness:

  • 🟒 Stable: Production-ready, well-tested, unlikely to change
  • 🟑 Beta: Usable but may have minor changes
  • πŸ”΄ Draft: Under development, expect changes
# Future: Check maturity before use
if enum_def.maturity_level == MaturityLevel.STABLE:
    use_in_production()

Modularization

Split the package into domain-specific modules for lighter installs:

# Future: Install only what you need
pip install valuesets-core        # Core functionality
pip install valuesets-bio         # Biological domains
pip install valuesets-materials   # Materials science
pip install valuesets-clinical    # Clinical/medical

Community Extensions

  • Domain Packages: Community-maintained domain-specific value sets
  • Organization Standards: Company/institution-specific enums that extend base sets
  • Mapping Tables: Cross-ontology and cross-standard mappings

Advanced Features

  • πŸ€– AI/LLM Integration: Semantic annotations optimized for language models
  • πŸ“Š Usage Analytics: Track which enums are most used, identify gaps
  • πŸ”„ Version Management: Handle enum evolution with deprecation warnings
  • 🌐 Multi-ontology Support: Map single values to multiple ontologies
  • πŸ” Fuzzy Matching: Find enums by approximate string matching

πŸ—οΈ Development

Installation

git clone https://github.com/linkml/valuesets
cd valuesets
uv install

Available Commands

just --list  # Show all available commands
just test    # Run tests  
just doctest # Run doctests
just lint    # Run linting
just site    # Build documentation site

🀝 Contributing

We welcome contributions! Whether you're adding new domains, improving existing enums, or fixing bugs:

  1. Domain Experts: Contribute standardized value sets for your field
  2. Developers: Add utility functions, improve tooling, fix issues
  3. Users: Report missing enums, suggest improvements, share use cases

πŸ“ Repository Structure

β”œβ”€β”€ src/valuesets/
β”‚   β”œβ”€β”€ schema/              # πŸ“ LinkML YAML schemas (source of truth)
β”‚   β”‚   β”œβ”€β”€ bio/            # Biological domains
β”‚   β”‚   β”‚   β”œβ”€β”€ cell_biology.yaml
β”‚   β”‚   β”‚   β”œβ”€β”€ structural_biology.yaml
β”‚   β”‚   β”‚   └── taxonomy.yaml
β”‚   β”‚   β”œβ”€β”€ spatial/        # Spatial and anatomical
β”‚   β”‚   β”‚   └── spatial_qualifiers.yaml
β”‚   β”‚   β”œβ”€β”€ statistics.yaml
β”‚   β”‚   └── core.yaml
β”‚   β”œβ”€β”€ enums/              # 🐍 Generated Python enums
β”‚   β”‚   └── <auto-generated from schemas>
β”‚   β”œβ”€β”€ generators/         # πŸ”§ Rich enum generator
β”‚   β”‚   └── rich_enum.py
β”‚   └── validators/         # βœ“ Ontology validation
β”‚       └── enum_evaluator.py
β”œβ”€β”€ docs/                   # πŸ“š Documentation
└── tests/                  # πŸ§ͺ Test cases
    β”œβ”€β”€ test_rich_enums.py  # Rich enum functionality
    └── validators/         # Ontology validation tests

πŸ“œ Credits

Built with LinkML and the linkml-project-copier template.


Making data standardization simple, semantic, and scalable πŸš€

About

Common value sets (enums) for science, biomedicine, computing, and other areas

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages