An open-source repository of Arabic poetry, with database dumps, REST API, and web interface.
Website · API · X Bot · HuggingFace Dataset · Database Dumps
Qafiyah is an open-source corpus of classical Arabic poetry: 944,844 verses from 932 poets spanning 10 historical eras. It offers full-text search with Arabic diacritics normalization; faceted browsing by era, meter (44), rhyme pattern (47), and theme (27); a public REST API (Bun + Hono, Docker) with auto-generated OpenAPI docs; downloadable PostgreSQL dumps; and a Hugging Face dataset for ML/NLP research. An X/Twitter bot posts a random poem four times daily. The project is built for readers, researchers, and developers working with classical Arabic literature.
One request, no auth, returns a random classical Arabic poem as plain text:
curl https://api.qafiyah.com/poems/randomFull schema and interactive playground: api.qafiyah.com/v1/docs.
- Qafiyah
| Tool | Purpose |
|---|---|
| Bun | Package manager and JavaScript runtime |
| Turborepo | Monorepo task orchestration and build caching |
| TypeScript | Language across all packages |
| envin | Type-safe environment variable loading and parsing |
| ts-pattern | Exhaustive, type-safe pattern matching used across all apps |
| neverthrow | Typed Result for fallible logic at module boundaries |
| Tool | Purpose |
|---|---|
| Astro | SSR framework (output: 'server'); pages rendered per request |
| @astrojs/node | Standalone Node adapter; runs the SSR server under Bun |
| React | Interactive islands (search, nav, random poem) |
| TailwindCSS | Utility-first CSS |
| Radix Slot | Polymorphic-render primitive for component composition |
| TanStack Query | Server-state and data-fetching in React islands |
| nuqs | Type-safe URL search-param state for React islands |
| lucide-react | Icon set used throughout the UI |
| clsx + tailwind-merge | Conditional class composition with Tailwind conflict resolution |
| class-variance-authority | Typed variant API for component styling |
| Tool | Purpose |
|---|---|
| Hono | Lightweight HTTP framework running on a Bun server (Docker) |
| oRPC | Type-safe RPC with shared contracts |
| Valibot | Schema validation for all oRPC contract inputs and outputs |
| OpenAPI | API spec auto-generated from oRPC contracts via @orpc/openapi |
| Scalar | Interactive API documentation served at /v1/docs |
| Docker | Container runtime; API and web served via Docker Compose |
| Tool | Purpose |
|---|---|
| GitHub Actions | Cron scheduler and runtime for the bot |
| twitter-api-v2 | X/Twitter API client |
| Tool | Purpose |
|---|---|
| Drizzle ORM | SQL query builder and schema definitions in packages/db |
| postgres.js | Underlying Postgres client that Drizzle wraps |
| PostgreSQL | Primary database; full-text search via tsvector/GIN indexes |
| Docker | Local Postgres containers for development and testing |
| Tool | Purpose |
|---|---|
| Biome | Linting and formatting for all JS/TS files |
| Prettier | Formatting for non-JS assets |
| Vitest | Unit and integration tests across all workspaces |
| Husky | Git hooks |
| commitlint | Conventional commit enforcement |
| Knip | Detection of unused files, dependencies, and exports |
| Madge | Circular dependency detection |
| dependency-cruiser | Architectural import rules across apps/ and packages/ |
| Syncpack | Cross-workspace dependency version consistency |
Qafiyah is a Bun + Turborepo monorepo with three apps and four shared packages.
qafiyah/
├── apps/
│ ├── web/ Astro 6 on-demand SSR (Bun + @astrojs/node); renders each route per request by fetching the API via oRPC, behind nginx proxy_cache
│ ├── api/ Hono REST API — Bun server (Docker container)
│ └── bot/ X/Twitter bot; posts 4× daily via GitHub Actions cron
└── packages/
├── db/ Drizzle ORM schema, queries, Arabic-text utilities, and Postgres client factory
├── contracts/ Shared oRPC contract definitions
├── constants/ Shared brand, URLs, and dev-port constants
└── typescript/ Shared TypeScript configs (base, astro, bun)
Package dependencies, who imports whom at compile time:
graph TD
subgraph APPS
WEB["apps/web\nAstro · React islands"]
API["apps/api\nHono · Bun server (Docker)"]
BOT["apps/bot\nGitHub Actions cron"]
end
subgraph PACKAGES
DB["packages/db\nDrizzle ORM · queries"]
CONTRACTS["packages/contracts\noRPC · Valibot schemas"]
CONSTANTS["packages/constants"]
end
WEB --> CONTRACTS & CONSTANTS
API --> DB & CONTRACTS & CONSTANTS
BOT --> CONTRACTS & CONSTANTS
DB --> CONTRACTS & CONSTANTS
CONTRACTS --> CONSTANTS
packages/typescript isn't shown above: it ships shared tsconfig presets (base, astro, bun) that every workspace consumes through extends, not via code imports.
Two architectural constraints worth noting. packages/db is consumed exclusively by apps/api, with no Drizzle or Postgres imports anywhere under apps/web or apps/bot. And apps/web holds no DB access of its own: it renders each route on demand (SSR), querying the API per request through server-only oRPC accessors in src/lib/server/ (pointed at INTERNAL_API_URL), while browser islands fetch the public API via src/lib/api/ (rpc.ts, client.ts, orpc.ts) for interactive features.
Runtime data flow, how requests move once deployed:
graph LR
BROWSER["Browser"]
BOT["apps/bot"]
GHA(["GitHub Actions\ncron 4×/day"])
HFPUB(["huggingface-publisher\nPython · run manually"])
TW(["X / Twitter"])
HF(["Hugging Face"])
subgraph VPS["Docker on VPS · docker compose"]
subgraph WEBIMG["web image"]
NGINX["nginx\nproxy_cache · static assets"]
WEB["Astro SSR · Bun\napps/web"]
end
API["apps/api\nHono · Bun"]
PG[("PostgreSQL")]
end
BROWSER -->|"page loads"| NGINX
NGINX -->|"proxy 127.0.0.1:4321"| WEB
BROWSER -->|"island data · search, random"| API
WEB -->|"per-request SSR oRPC · INTERNAL_API_URL"| API
API -->|"@qafiyah/db · SQL + FTS"| PG
GHA --> BOT
BOT -->|"GET /poems/random"| API
BOT -->|"post tweet"| TW
HFPUB -.->|"SQL read"| PG
HFPUB -.->|"push_to_hub"| HF
Everything inside Docker on VPS ships from docker-compose.yml (docker compose up -d --build): the web image bundles nginx (proxy_cache + static assets) in front of the Astro SSR server, which reaches the api service over the internal network (INTERNAL_API_URL), while browser islands call the public API directly. The bot runs on GitHub Actions, outside the VPS, and also hits the public API.
Dashed arrows (-.->) are out-of-band, non-request relationships: the Hugging Face dataset export is run manually via tools/huggingface-publisher, a Python script that reads PostgreSQL directly (SQLAlchemy) and pushes with push_to_hub — it never goes through the API.
Current statistics:
| Entity | Count |
|---|---|
| Verses | 944,844 |
| Poems | 85,342 |
| Poets | 932 |
| Eras | 10 |
| Meters | 44 |
| Rhyme patterns | 47 |
| Themes | 27 |
Counts above reflect the latest dump (0003_29_01_2026, January 2026). Stats refresh with each new dump in dumps/.
PostgreSQL custom-format dumps are published in dumps/ and refreshed periodically. They are provided for research and integration as an alternative to scraping the API. See the restore instructions; restoring requires PostgreSQL ≥ 17 and pg_restore.
- Bun ≥ 1.3.14 (pinned via the root
packageManagerfield;enginesrequiresbun >= 1) - Docker (runs Postgres, and the API + web for the full stack)
- PostgreSQL ≥ 17 with
pg_restore(only needed if you restore the bundled dumps directly; the Dockerized workflow handles this for you)
git clone https://github.com/alwalxed/qafiyah.git
cd qafiyah
bun installbun run dev # seeded Postgres in Docker + hot-reloading web & API (Turbo)dev brings up the database container (auto-seeded from the latest dump on first boot), writes the local .env files, then starts the dev servers. For a full containerized run, use bun run up.
Dev and build
| Script | Description |
|---|---|
bun run dev |
Seeded Postgres (Docker) + web & API with hot reload (Turbo) |
bun run up |
Build + run the full stack in Docker (DB self-seeds on first boot) |
bun run down |
Stop the Docker stack |
bun run build |
Build all workspaces |
bun run db:up |
Start just the Postgres container (auto-seeds on a fresh volume) |
bun run db:reset |
Wipe the DB volume and re-seed from the latest dump |
bun run clean |
Kill orphan Astro and API server processes from prior dev runs |
Quality
| Script | Description |
|---|---|
bun run test |
Run Vitest across all workspaces |
bun run types |
Type-check all workspaces with tsc --noEmit |
bun run lint |
Lint and auto-fix with Biome |
bun run format |
Format JS/TS with Biome and Markdown/MDX with Prettier |
bun run knip |
Detect unused files, dependencies, and exports |
bun run madge |
Detect circular imports across apps/ and packages/ |
bun run depcruise |
Run dependency-cruiser against the architectural rules in .dependency-cruiser.cjs |
Boundary checks
| Script | Description |
|---|---|
bun run check:boundaries |
Forbid cross-app imports (apps may not import from each other) |
bun run check:naming |
Enforce project-wide naming conventions on files and identifiers |
bun run check:no-parent-imports |
Forbid ../ imports anywhere; siblings or @/ aliases only |
bun run check:api-db-isolation |
Forbid Drizzle or postgres imports outside packages/db |
bun run check:constants |
Ensure brand strings, URLs, and ports live in packages/constants, not in app code |
bun run check:syncpack |
Verify dependency versions are consistent across all workspaces |
Aggregate and utilities
| Script | Description |
|---|---|
bun run ci |
Full pipeline: format and lint sequentially, then run types, test, knip, madge, all six boundary checks, depcruise, bun audit, and the API smoke test in parallel |
bun run smoke |
Spin up the API locally and hit each public endpoint to catch breakage in the request path |
bun run deps:doctor |
Diagnose and update workspace dependencies |
bun run optimize:images |
Convert raster images in the repo to sibling .webp files using Bun.Image |
bun --filter @qafiyah/web run verify:seo |
Crawl the running web server and assert SEO parity (canonical URLs, JSON-LD, metadata) across rendered pages |
Three GitHub Actions workflows live in .github/workflows/:
ci.yml, runs on every push and pull request tomain. Executes the same checks asbun run ci, plus a final gate that fails the build if any file changed during the run (catches uncommitted formatting fixes).post-poem.yml, cron-triggered, posts a random poem to X four times a day.gitleaks.yml, secret scanning on push and pull request.
The boundary checks enforce the architectural rules the project relies on: no cross-app imports, no ../ parent imports, Drizzle and the Postgres client confined to packages/db, brand strings and ports centralized in packages/constants, naming conventions across the tree, and consistent dependency versions across workspaces.
The CI pipeline definition lives in scripts/ci.ts, bun run ci and the GitHub job both consume it, so local and remote stay in sync.
The public REST API is hosted at api.qafiyah.com and ships with interactive documentation at api.qafiyah.com/v1/docs, generated from oRPC contracts via @orpc/openapi and rendered with Scalar. The root URL redirects to the docs.
The API is free, requires no authentication, and is provided on a best-effort basis with no SLA.
- Fair use. Per-IP throttling is enforced at the server level. For bulk access, prefer the PostgreSQL dumps or the HuggingFace dataset over paginating the API.
- Caching. Responses are cacheable; cache them client-side when possible to reduce load.
- Stability.
v1endpoints are stable. Breaking changes ship behind a new major version. - Attribution. Not required, but appreciated, see Citation if you publish work that relies on the corpus.
- Semantic Search
- Elasticsearch
- Mobile app (React Native / Expo)
- Internal dashboard for content management
- Dark mode
Contributions are welcome. Before opening a pull request, please read:
Listed chronologically by date of contribution:
- Khalid Alraddady, AI Engineer at HRSD. Development of the semantic search feature currently under active development.
- Khalid Almulaify, PhD in Morphology and Syntax at IMSIU. Ongoing financial sponsorship ($100/month) and sustained usage of the public API through a Telegram bot.
- Malath Alsaif, Software Engineer at Ejari. UI improvements and implementation of the local database development workflow.
- Fahad Alghamdi, Software Engineer at Thmanyah. Diagnosis of a redundant per-request
SELECT 1health check in the DB middleware.
Projects and tools that use the Qafiyah corpus or API:
- QafiyahVerseBot, Telegram bot serving classical Arabic verses on demand, by Khalid Almulaify.
Built something with Qafiyah? Open a PR to add it here.
If you use Qafiyah in academic work, please cite it as:
@misc{qafiyah2026,
title = {Qafiyah: An Open Corpus of Classical Arabic Poetry},
author = {Alqahtani, Alwaleed},
year = {2026},
url = {https://qafiyah.com},
note = {Dataset: https://huggingface.co/datasets/qafiyah/classical-arabic-poetry}
}If Qafiyah is useful to you or your work, you can support the project on GitHub Sponsors. Sponsorship funds dataset upkeep, API hosting, and ongoing maintenance.
Released under the MIT License.