A systematic, test-driven approach to generating complete constructed languages from phonemic inventory through Swadesh lists to exportable vocabularies.
LangForge creates linguistically coherent constructed languages that produce:
- Phonemic inventories with realistic sound systems
- Syllable structures with proper phonotactic constraints
- Morpheme generation (roots & affixes) with language-specific patterns
- Word building using intelligent morphological strategies
- Core vocabulary mapped to 207 Swadesh concepts
- Professional export capabilities (CSV, JSON with metadata)
Inspired by VulgarLang, but built with a library-first, test-driven architecture.
LangForge now provides a complete pipeline from phonological generation to exportable vocabularies:
- Phase 1: Advanced syllable generation across 5 language families
- Phase 2: Morpheme generation (roots & affixes) with linguistic realism
- Phase 3: Word building, Swadesh list generation, and professional export
- Test Coverage: 59 passing tests ensuring production reliability
- Demo Ready: Professional demonstration scripts for interviews and showcases
Phonological Systems (5 Language Families + Random)
- Polynesian: Simple CV syllables - Hawaiian, Samoan, Maori
- Japanese: Strict CV(n) patterns - minimal clusters
- Germanic: Complex consonant clusters - English, German
- Romance: Flowing open syllables - Spanish, Italian
- Sinitic: Tonal-ready with limited finals - Mandarin
- Random: Truly random generation with 50+ phoneme combinations
Morphological Generation
- Root morpheme generation using syllable patterns
- Affix systems (prefixes, suffixes) with language-specific complexity
- Language-appropriate morpheme lengths and structures
Word Building & Vocabulary
- Multiple strategies: simple, complex, mixed morphological approaches
- Intelligent morpheme combination respecting phonotactic constraints
- Realistic word length distributions per language family
Swadesh List Generation
- Complete 207-concept Swadesh list support
- Configurable concept counts for targeted vocabulary
- Semantic category organization and prioritization
- Phonologically consistent word generation
Professional Export
- CSV export for spreadsheet analysis
- JSON export with complete language metadata
- Auto-format detection from file extensions
- Language metadata including phonology, morphology, and statistics
import langforge
# Create the language forge
forge = langforge.Forge()
# Generate a complete language with phonology
language = forge.generate("polynesian")
print(f"Consonants: {language.phonology['consonants']}")
print(f"Vowels: {language.phonology['vowels']}")
print(f"Sample syllables: {language.syllables[:8]}")
# Build morphemes from syllables
roots = forge.morphemes("polynesian", type="roots", count=10)
affixes = forge.morphemes("polynesian", type="affixes", count=5)
print(f"Roots: {roots}")
print(f"Affixes: {affixes}")
# Build words using different strategies
simple_words = forge.build_words("polynesian", count=8, strategy="simple")
complex_words = forge.build_words("germanic", count=8, strategy="complex")
print(f"Simple words: {simple_words}")
print(f"Complex words: {complex_words}")
# Generate Swadesh vocabulary
swadesh = forge.generate_swadesh("japanese", count=30)
print(f"Generated {len(swadesh)} concept mappings")
print(f"Sample: I→{swadesh.get_word('I')}, water→{swadesh.get_word('water')}")
# Export for external use
forge.export(swadesh, "my_language.csv", format="csv")
forge.export(swadesh, "my_language.json", format="json",
include_metadata=True, language=language)Polynesian - Simple and elegant:
Syllables: ['ma', 'ju', 'wa', 'pi', 'je', 'u', 'fi', 'e']
Words: ['pi', 'mawa', 'jue', 'wafu']
Swadesh: I→na, you→ju, water→mauhe
Germanic - Complex and robust:
Syllables: ['θpʌg', 'mŋot', 'gɛʃl', 'lir', 'θən']
Words: ['θpʌgmŋot', 'gɛʃl', 'θənlir']
Swadesh: I→mʌθæt, you→wud, water→ŋʌθrub
Japanese - Clean and structured:
Syllables: ['bo', 'ra', 'he', 'sa', 'mo', 'we']
Words: ['bora', 'hesa', 'mowebo']
Swadesh: I→boku, you→kimi, water→mizu
Run the comprehensive, beautifully formatted demonstration:
python demo_unified.pyPerfect for:
- Technical interviews - Shows systematic thinking
- Demo sessions - Engaging comprehensive overview
- Showing off to friends - Beautiful, impressive output
- Documentation - Complete feature reference
Includes:
- Complete pipeline demonstration
- Cross-family linguistic comparison
- Export capabilities showcase
- Technical achievements metrics
- Future development roadmap
# Original demo - Phase 1 & 2 focus
python demo.py
# Phase 3 specific features
python demo_phase3.py
# Complete unified demonstration
python demo_unified.pyLangForge achieves production reliability through comprehensive testing:
# Run all tests (59 passing!)
python -m pytest tests.py -v
# Test specific phases
python -m pytest tests.py -k "word_building" -v
python -m pytest tests.py -k "swadesh" -v
python -m pytest tests.py -k "export" -v
# Run foundation tests
python -m pytest tests.py::TestForge -v- Phase 1: Phonological generation across 5 families
- Phase 2: Morpheme generation with linguistic realism
- Phase 3: Word building, Swadesh lists, export functionality
- Quality: Linguistic realism and consistency validation
- API: Complete public interface testing
- Integration: End-to-end pipeline validation
- Multi-language family validation (5 language families)
- Morpheme generation testing (roots & affixes)
- Word building strategies (simple, complex, mixed)
- Swadesh list generation (concept mapping, coverage)
- Export functionality (CSV, JSON, metadata)
- Linguistic realism validation (statistical analysis)
# Language generation
forge = Forge()
language = forge.generate("polynesian") # or "japanese", "germanic", "romance", "sinitic", "random"
# Access generated components
language.phonology['consonants'] # List of consonant phonemes
language.phonology['vowels'] # List of vowel phonemes
language.syllables # Generated syllable inventory# Generate morphemes
roots = forge.morphemes("japanese", type="roots", count=20)
affixes = forge.morphemes("germanic", type="affixes", count=10)
# Language-specific patterns automatically applied
# Japanese: 2-3 syllable roots, simple affixes
# Germanic: Complex morphemes with consonant clusters# Multiple word building strategies
simple_words = forge.build_words("polynesian", count=15, strategy="simple")
complex_words = forge.build_words("germanic", count=15, strategy="complex")
mixed_words = forge.build_words("romance", count=15, strategy="mixed")
# Strategies:
# - simple: Mostly single morphemes (isolating languages)
# - complex: Multiple morpheme combinations (agglutinative)
# - mixed: Balanced approach (most natural languages)# Generate concept-to-word mappings
swadesh = forge.generate_swadesh("japanese", count=50)
# Access mappings
word = swadesh.get_word("water") # Get word for concept
concepts = swadesh.to_dict() # All mappings as dictionary
print(len(swadesh)) # Number of concepts
"water" in swadesh # Check if concept exists
# Semantic categories automatically prioritized
# Essential concepts like "I", "you", "water" appear first# Export to different formats
forge.export(swadesh, "language.csv") # Auto-detect CSV
forge.export(swadesh, "language.json") # Auto-detect JSON
forge.export(swadesh, "output.csv", format="csv") # Explicit format
forge.export(swadesh, "complete.json", format="json", # With metadata
include_metadata=True, language=language)
# Export includes:
# - CSV: Simple concept,word pairs
# - JSON: Complete language metadata, phonology, statisticslangGen/langforge/
├── __init__.py # Package exports (Forge, SwadeshList)
├── forge.py # Complete pipeline: phonology → vocabulary
├── demo_unified.py # Professional demonstration script
├── demo.py # Phase 1&2 focused demo
├── demo_phase3.py # Phase 3 specific demo
├── tests.py # 59 comprehensive tests
├── README_DEMO.md # Demo documentation
├── phones.py # Legacy phoneme data
├── *.csv # IPA phoneme reference data
└── debugging.ipynb # Development notebook
- Multi-language family support (Polynesian, Japanese, Germanic, Romance, Sinitic)
- Truly random generation with varied phoneme inventories
- Weighted syllable structure generation
- Position-sensitive phonotactic constraints
- Root morpheme generation using syllable patterns
- Affix systems with language-specific complexity
- Linguistic realism validation
- TDD methodology proven effective
- Intelligent word building strategies
- Complete Swadesh list generation (207 concepts)
- Professional CSV/JSON export capabilities
- Production-ready demonstration scripts
- Grammar Generation: Basic syntactic rules and sentence structure
- Orthography Systems: Writing system generation (alphabets, syllabaries)
- Phonological Change: Simulate historical language evolution
- Morphophonological Rules: Advanced sound changes in word formation
- Web Interface: Browser-based language generation tool
- Community Platform: Share and explore generated languages
- Game Development: Rich, believable fantasy worlds
- Creative Writing: Authentic fictional cultures
- Linguistic Education: Hands-on phonology learning
- Research Tool: Systematic language comparison
- Entertainment Industry: Film and TV language creation
- Test-Driven Development: 59 passing tests ensure reliability
- Modular Architecture: Clean separation of concerns
- Linguistic Accuracy: Based on real typological patterns
- Production Ready: Professional export and integration capabilities
- Documentation: Comprehensive demos and examples
- Test Coverage: 100% of public API tested
- Linguistic Realism: Statistical validation across language families
- Performance: Efficient generation suitable for real-time applications
- Extensibility: Easy to add new language families and features
- Reliability: Deterministic output for reproducible results
Built with for the constructed language community.
Ready for production use in creative, educational, and research applications!