This directory contains all analysis and utility scripts organized by function.
Active pipeline scripts for main analysis phases:
- Frequency analysis
- Morphology analysis
- Semantic analysis
- Clustering analysis
- Spatial analysis
- Character embeddings
- N-gram analysis
- Semantic composition
Data preprocessing and validation scripts:
- Create preprocessed table
- Backup analysis tables
- Audit logging and reporting
- Validation scripts
Analysis utilities and tools:
- LLM character labeling
- Embedding analysis
- Semantic co-occurrence
- Semantic network analysis
- Lexicon coverage
- Cluster analysis
Visualization and mapping scripts:
- Cluster visualization
- Embedding visualization
- Semantic network visualization
- Spatial maps generation
Report generation scripts:
- Comprehensive report generator
- Analysis results extraction
- Chinese showcase creation
Query utilities for database:
- Query results
- Query tendency
- Query spatial tendency
General utilities:
- Database verification
- Feature materialization
- Export results
- Compare runs
- Initialize tables
Experimental and alternative approaches:
- Village-level clustering (KMeans, DBSCAN, GMM)
- Hierarchical clustering
- Spatial tendency integration
Run scripts from the project root directory:
# Core pipeline
python scripts/core/run_frequency_analysis.py
# Preprocessing
python scripts/preprocessing/create_preprocessed_table.py
# Analysis
python scripts/analysis/llm_label_characters.py
# Reporting
python scripts/reporting/generate_comprehensive_report.py