A complete solution for accessing, processing, and browsing Moldova's customs tariff nomenclature data from trade.gov.md.
🌐 Live App - Browse 15,067 tariff items with instant search and multilingual support
This monorepo contains:
- Scraper - Python tools to fetch and process 15,067 tariff items from Moldova's API
- Tax Scraper - Fetches VAT, customs duties, and excise tax data for all NC codes
- Web App - React SPA for browsing the nomenclature with advanced search and tax information
graph TB
subgraph External
API[Moldova Trade API<br/>trade.gov.md/api/tarim-nomenclature]
end
subgraph "Data Collection Layer"
Scraper[scraper.py<br/>Sequential Scraper]
Raw[raw_responses/<br/>page_1.json ... page_302.json<br/>15,067 items]
end
subgraph "Data Processing Layer"
Processor[processor.py<br/>Tree Builder]
Flat[data/nomenclature_flat.json<br/>14.7MB - Flat array with paths]
Tree[data/nomenclature_tree.json<br/>9.5MB - Nested hierarchy]
TaxScraper[tax_scraper.py<br/>Tax Data Collector]
TaxData[tax_responses/<br/>12,661 tax files]
Merger[merge_tax_data.py<br/>Data Merger]
TreeWithTax[data/nomenclature_tree_with_tax.json<br/>Enhanced with tax info]
end
subgraph "Presentation Layer (React SPA)"
WebApp[web/src/pages/Home.jsx<br/>Main Component]
Display[Display Features]
Search[Search Engine]
WebApp --> Display
WebApp --> Search
Display --> |Visual Indentation|Render[Hierarchical Rendering<br/>15,067 items]
Display --> |Click to Copy|Clipboard[Clipboard Integration]
Search --> |Debounced 150ms|Filter[Filter Logic]
Filter --> |Diacritics-insensitive|Normalize[normalizeText()]
Filter --> |Wildcard *|Prefix[Prefix Matching]
end
subgraph "Deployment"
Build[npm run build]
Dist[dist/<br/>Production Build]
Netlify[Netlify CDN<br/>vama-md.netlify.app]
end
API -->|302 pages × 50 items<br/>Rate-limited: 2-3s delays| Scraper
Scraper -->|Save individually<br/>Resume-safe| Raw
Raw -->|Read all files<br/>Build lookups| Processor
Processor -->|Transform & Sort| Flat
Processor -->|Build hierarchy<br/>Sort by (nc=='', nc)| Tree
API -->|Tax data for 12,661 NC codes<br/>Rate-limited: 1-1.5s delays| TaxScraper
TaxScraper -->|Individual JSON files<br/>Resume-safe| TaxData
Tree -->|Base hierarchy| Merger
TaxData -->|Tax info lookup| Merger
Merger -->|Merge tax_info into items| TreeWithTax
TreeWithTax -->|Fetch on load<br/>Enhanced JSON| WebApp
WebApp -->|Flatten with levels<br/>Depth-first traversal| Display
WebApp --> Build
Build --> Dist
Dist --> Netlify
style API fill:#e1f5ff
style Scraper fill:#fff4e1
style Processor fill:#fff4e1
style TaxScraper fill:#fff4e1
style Merger fill:#fff4e1
style WebApp fill:#e8f5e9
style Netlify fill:#f3e5f5
style TreeWithTax fill:#ffebee
Data Flow Summary:
- Scraper fetches paginated API responses (302 pages, ~12-15 min)
- Processor transforms flat data into hierarchical tree structure
- Tax Scraper fetches VAT, customs, and excise data for all NC codes (~4.4 hours)
- Merger combines nomenclature tree with tax information
- Web App loads enhanced JSON, flattens it, and provides instant search with tax data
- Deployment builds static site and deploys to Netlify CDN
Key Design Decisions:
- Resume-safe scraping: Individual page files allow safe interruption
- Dual formats: Flat JSON for analysis, tree JSON for display
- Client-side flattening: Tree flattened on load for O(1) search
- No virtualization: Modern browsers efficiently handle 15k items
- Proper ordering: NC codes sorted first at each hierarchy level
cd web
npm install
npm run devOpen http://localhost:5173 to browse the nomenclature.
cd scraper
pip install -r requirements.txt
# Test scraper (dry run - first page only)
python scraper.py --dry-run
# Full scrape (~12-15 minutes)
python scraper.py
# Process raw data into structured formats
python processor.py
# Scrape tax information (VAT, customs, excise) for all NC codes
python tax_scraper.py --dry-run # Test with first 3 codes
python tax_scraper.py # Full scrape (~4.4 hours, resumable)
# Merge tax data with nomenclature
python merge_tax_data.py
# Copy enhanced data to web app
cp data/nomenclature_tree_with_tax.json ../web/public/data/├── scraper/ # Python data collection tools
│ ├── scraper.py # API scraper (sequential, rate-limited)
│ ├── processor.py # Data processor (builds tree/flat formats)
│ ├── tax_scraper.py # Tax data scraper (VAT, customs, excise)
│ ├── merge_tax_data.py # Merges tax data with nomenclature
│ ├── search.py # CLI keyword search utility
│ ├── stats.py # Statistics viewer
│ ├── requirements.txt # Python dependencies
│ ├── raw_responses/ # Raw API responses (gitignored)
│ ├── tax_responses/ # Tax data responses (gitignored)
│ ├── data/ # Processed data (gitignored)
│ ├── logs/ # Scraping logs (gitignored)
│ └── TAX_SCRAPER_README.md # Tax scraper documentation
│
├── web/ # React SPA for browsing
│ ├── src/
│ │ ├── pages/
│ │ │ ├── Home.jsx # Main list view with search and tax data
│ │ │ └── Category.jsx # Detail view (unused)
│ │ ├── config/
│ │ │ └── countries.js # Country list for customs rates
│ │ ├── App.jsx # Main app layout
│ │ └── App.css # Styling
│ ├── public/
│ │ └── data/ # Processed JSON (nomenclature_tree_with_tax.json)
│ ├── netlify.toml # Netlify deployment config
│ └── package.json
│
├── README.md # This file
└── CLAUDE.md # Claude Code instructions
- Hierarchical display: All 15,067 items with visual indentation
- Tax information: VAT rates, customs duties (by country), and excise taxes
- Country selector: Choose from 60+ countries to view specific customs rates
- Instant search: Debounced (150ms) with diacritics-insensitive matching
- Advanced search modes:
- Wildcard:
0101*for prefix matching on NC codes - Exact:
"horse"for exact phrase matching - Multi-word: All words must match in any name field
- Numeric: Automatic NC code search with space/dot tolerance
- Wildcard:
- Search highlighting: Matched terms highlighted in results
- Context-aware results: Shows matched items with their parents and children
- Clipboard copy: Click any NC code to copy to clipboard with toast notification
- LLM export: Download entire dataset as hierarchical markdown (~75MB)
- ChatGPT integration: Quick access to AI-assisted tariff classification
- Multilingual: Displays Romanian, Russian, and English names
- Responsive: Clean layout optimized for data tables
- Virtualized rendering: Efficient display of 15k+ items with react-window
- Resume capability: Skips already-downloaded files
- Rate limiting: 2-3 seconds between requests with exponential backoff
- Safety net: All raw responses saved individually
- Progress tracking: Detailed logging to console and file
- Error handling: Retries with exponential backoff on network errors
- Comprehensive coverage: Fetches tax data for 12,661 unique NC codes
- Resume capability: Skips already-fetched files, can resume from any index
- Rate limiting: 1-1.5 seconds between requests
- Robust error handling: Retries with exponential backoff for 404s, 429s, network errors
- Progress tracking: Real-time reports every 100 items, detailed logging
- Data merging: Combines tax information with nomenclature tree
- Tree structure: Builds proper hierarchical JSON with nested children
- Proper ordering: NC codes first, then empty values (category headers)
- Multilingual: Preserves all languages (EN, RO, RU)
- Regulatory info: Import/export/transit acts included
- Tax integration: Merges VAT, customs, and excise data into tree structure
- Statistics: Depth analysis, act counts, etc.
[
{
"id": 61032,
"nc": "0101",
"name_en": "Horses, asses, mules and hinnies, live",
"name_ro": "Cai, măgari, catâri şi bardoi, vii",
"name_ru": "Лошади, ослы, мулы и лошаки, живые",
"import_acts": [...],
"export_acts": [...],
"transit_acts": [...],
"tax_info": {
"vat": "20%",
"excise": "-",
"tax_values": [
{
"country": 1,
"tax_value": "0%"
},
{
"country": 2,
"tax_value": "5%"
}
]
},
"children": [
{
"id": 61033,
"nc": "",
"name_en": "– Horses:",
"children": [...]
}
]
}
]Structure notes:
- Properly ordered at each level: items with NC codes first (sorted), then category headers without codes
tax_infofield contains VAT rate, excise tax, and country-specific customs dutiestax_valuesarray provides customs rates for different countries (use country ID to look up)
The web app is configured for Netlify deployment:
cd web
npm run buildDeploy the dist/ folder to Netlify. The netlify.toml handles:
- SPA routing redirects
- Cache headers for static assets
- Optimal serving of the enhanced nomenclature_tree_with_tax.json
- Total items: 15,067
- Unique NC codes: 12,661 (with tax information)
- Root categories: 1,262 (no parent)
- Leaf items: 9,889 (most specific classifications)
- Hierarchy depth: Up to 9 levels
- With import acts: 3,520+ items
- Countries supported: 60+ (for customs rates)
- Data size: Enhanced tree JSON with tax data, 14.7MB (flat JSON)
- Base URL:
https://trade.gov.md/api/tarim-nomenclature/ - Total pages: 302 (50 items per page)
- Data structure: Flat paginated list with parent/child relationships
- Rate limits: None specified (scraper uses 2-3 sec delays)
cai→ matches "Cai, măgari, catâri şi bardoi, vii" (ignores diacritics, highlights matches)0101*→ shows all NC codes starting with 0101 (wildcard prefix search)"live animals"→ exact phrase match (wrapped in quotes)0101 21 000→ finds NC code (spaces ignored in numeric search)furniture wood→ both words must match (multi-word search)- Select a country from dropdown to view country-specific customs rates
python search.py "mobilă" --lang=ro --limit=10
python search.py "textile" --actsData sourced from Moldova's official trade.gov.md API. Check their terms of use for data usage rights.