Skip to content

alwalxed/qafiyah

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

586 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Qafiyah

Qafiyah

An open-source repository of Arabic poetry, with database dumps, REST API, and web interface.

Turborepo Docker Astro Bun TypeScript Hono Drizzle OpenAPI oRPC Valibot Scalar

Website · API · X Bot · HuggingFace Dataset · Database Dumps

About

Qafiyah is an open-source corpus of classical Arabic poetry: 944,844 verses from 932 poets spanning 10 historical eras. It offers full-text search with Arabic diacritics normalization; faceted browsing by era, meter (44), rhyme pattern (47), and theme (27); a public REST API (Bun + Hono, Docker) with auto-generated OpenAPI docs; downloadable PostgreSQL dumps; and a Hugging Face dataset for ML/NLP research. An X/Twitter bot posts a random poem four times daily. The project is built for readers, researchers, and developers working with classical Arabic literature.

Try it

One request, no auth, returns a random classical Arabic poem as plain text:

curl https://api.qafiyah.com/poems/random

Full schema and interactive playground: api.qafiyah.com/v1/docs.

Table of Contents

Tech Stack

Core

Tool Purpose
Bun Package manager and JavaScript runtime
Turborepo Monorepo task orchestration and build caching
TypeScript Language across all packages
envin Type-safe environment variable loading and parsing
ts-pattern Exhaustive, type-safe pattern matching used across all apps
neverthrow Typed Result for fallible logic at module boundaries

Web (apps/web)

Tool Purpose
Astro SSR framework (output: 'server'); pages rendered per request
@astrojs/node Standalone Node adapter; runs the SSR server under Bun
React Interactive islands (search, nav, random poem)
TailwindCSS Utility-first CSS
Radix Slot Polymorphic-render primitive for component composition
TanStack Query Server-state and data-fetching in React islands
nuqs Type-safe URL search-param state for React islands
lucide-react Icon set used throughout the UI
clsx + tailwind-merge Conditional class composition with Tailwind conflict resolution
class-variance-authority Typed variant API for component styling

API (apps/api)

Tool Purpose
Hono Lightweight HTTP framework running on a Bun server (Docker)
oRPC Type-safe RPC with shared contracts
Valibot Schema validation for all oRPC contract inputs and outputs
OpenAPI API spec auto-generated from oRPC contracts via @orpc/openapi
Scalar Interactive API documentation served at /v1/docs
Docker Container runtime; API and web served via Docker Compose

Bot (apps/bot)

Tool Purpose
GitHub Actions Cron scheduler and runtime for the bot
twitter-api-v2 X/Twitter API client

Data Layer

Tool Purpose
Drizzle ORM SQL query builder and schema definitions in packages/db
postgres.js Underlying Postgres client that Drizzle wraps
PostgreSQL Primary database; full-text search via tsvector/GIN indexes
Docker Local Postgres containers for development and testing

Tooling

Tool Purpose
Biome Linting and formatting for all JS/TS files
Prettier Formatting for non-JS assets
Vitest Unit and integration tests across all workspaces
Husky Git hooks
commitlint Conventional commit enforcement
Knip Detection of unused files, dependencies, and exports
Madge Circular dependency detection
dependency-cruiser Architectural import rules across apps/ and packages/
Syncpack Cross-workspace dependency version consistency

Architecture

Qafiyah is a Bun + Turborepo monorepo with three apps and four shared packages.

qafiyah/
├── apps/
│   ├── web/          Astro 6 on-demand SSR (Bun + @astrojs/node); renders each route per request by fetching the API via oRPC, behind nginx proxy_cache
│   ├── api/          Hono REST API — Bun server (Docker container)
│   └── bot/          X/Twitter bot; posts 4× daily via GitHub Actions cron
└── packages/
    ├── db/           Drizzle ORM schema, queries, Arabic-text utilities, and Postgres client factory
    ├── contracts/    Shared oRPC contract definitions
    ├── constants/    Shared brand, URLs, and dev-port constants
    └── typescript/   Shared TypeScript configs (base, astro, bun)

Package dependencies, who imports whom at compile time:

graph TD
  subgraph APPS
    WEB["apps/web\nAstro · React islands"]
    API["apps/api\nHono · Bun server (Docker)"]
    BOT["apps/bot\nGitHub Actions cron"]
  end
  subgraph PACKAGES
    DB["packages/db\nDrizzle ORM · queries"]
    CONTRACTS["packages/contracts\noRPC · Valibot schemas"]
    CONSTANTS["packages/constants"]
    TS["packages/typescript"]
  end
  WEB --> CONTRACTS & CONSTANTS & TS
  API --> DB & CONTRACTS & CONSTANTS & TS
  BOT --> CONSTANTS
  DB --> TS
  CONTRACTS --> TS
Loading

Two architectural constraints worth noting. packages/db is consumed exclusively by apps/api, with no Drizzle or Postgres imports anywhere under apps/web or apps/bot. And apps/web holds no DB access of its own: it renders each route on demand (SSR), querying the API per request through server-only oRPC accessors in src/lib/server/ (pointed at INTERNAL_API_URL), while browser islands fetch the public API via src/lib/api/ (rpc.ts, client.ts, orpc.ts) for interactive features.

Runtime data flow, how requests move once deployed:

graph LR
  BROWSER["Browser"] -->|"search / random"| WEB["apps/web"]
  BROWSER -->|"search / random"| API["apps/api"]
  WEB -->|"per-request SSR oRPC"| API
  API --> DB["packages/db"]
  DB -->|"SQL + FTS"| PG[("PostgreSQL")]
  API -->|"hosted in"| CF(["Docker container"])
  GHA(["GitHub Actions"]) -->|"cron"| BOT["apps/bot"]
  BOT -->|"GET /poems/random"| API
  BOT -->|"post tweet"| TW(["X / Twitter"])
  API -.->|"dataset export"| HF(["Hugging Face"])
Loading

Dashed arrows (-.->) represent out-of-band or non-request relationships: the periodic Hugging Face dataset export.

Database

Current statistics:

Entity Count
Verses 944,844
Poems 85,342
Poets 932
Eras 10
Meters 44
Rhyme patterns 47
Themes 27

Counts above reflect the latest dump (0003_29_01_2026, January 2026). Stats refresh with each new dump in dumps/.

PostgreSQL custom-format dumps are published in dumps/ and refreshed periodically. They are provided for research and integration as an alternative to scraping the API. See the restore instructions; restoring requires PostgreSQL ≥ 17 and pg_restore.

Getting Started

Prerequisites

  • Bun ≥ 1.3.14 (pinned via the root packageManager field; engines requires bun >= 1)
  • Docker (runs Postgres, and the API + web for the full stack)
  • PostgreSQL ≥ 17 with pg_restore (only needed if you restore the bundled dumps directly; the Dockerized workflow handles this for you)

Installation

git clone https://github.com/alwalxed/qafiyah.git
cd qafiyah
bun install

Development

bun run dev        # seeded Postgres in Docker + hot-reloading web & API (Turbo)

dev brings up the database container (auto-seeded from the latest dump on first boot), writes the local .env files, then starts the dev servers. For a full containerized run, use bun run up.

Scripts

Dev and build

Script Description
bun run dev Seeded Postgres (Docker) + web & API with hot reload (Turbo)
bun run up Build + run the full stack in Docker (DB self-seeds on first boot)
bun run down Stop the Docker stack
bun run build Build all workspaces
bun run db:up Start just the Postgres container (auto-seeds on a fresh volume)
bun run db:reset Wipe the DB volume and re-seed from the latest dump
bun run clean Kill orphan Astro and API server processes from prior dev runs

Quality

Script Description
bun run test Run Vitest across all workspaces
bun run types Type-check all workspaces with tsc --noEmit
bun run lint Lint and auto-fix with Biome
bun run format Format JS/TS with Biome and Markdown/MDX with Prettier
bun run knip Detect unused files, dependencies, and exports
bun run madge Detect circular imports across apps/ and packages/
bun run depcruise Run dependency-cruiser against the architectural rules in .dependency-cruiser.cjs

Boundary checks

Script Description
bun run check:boundaries Forbid cross-app imports (apps may not import from each other)
bun run check:naming Enforce project-wide naming conventions on files and identifiers
bun run check:no-parent-imports Forbid ../ imports anywhere; siblings or @/ aliases only
bun run check:api-db-isolation Forbid Drizzle or postgres imports outside packages/db
bun run check:constants Ensure brand strings, URLs, and ports live in packages/constants, not in app code
bun run check:syncpack Verify dependency versions are consistent across all workspaces

Aggregate and utilities

Script Description
bun run ci Full pipeline: format and lint sequentially, then run types, test, knip, madge, all six boundary checks, depcruise, bun audit, and the API smoke test in parallel
bun run smoke Spin up the API locally and hit each public endpoint to catch breakage in the request path
bun run deps:doctor Diagnose and update workspace dependencies
bun run optimize:images Convert raster images in the repo to sibling .webp files using Bun.Image
bun --filter @qafiyah/web run verify:seo Crawl the running web server and assert SEO parity (canonical URLs, JSON-LD, metadata) across rendered pages

Continuous Integration

Three GitHub Actions workflows live in .github/workflows/:

  • ci.yml, runs on every push and pull request to main. Executes the same checks as bun run ci, plus a final gate that fails the build if any file changed during the run (catches uncommitted formatting fixes).
  • post-poem.yml, cron-triggered, posts a random poem to X four times a day.
  • gitleaks.yml, secret scanning on push and pull request.

The boundary checks enforce the architectural rules the project relies on: no cross-app imports, no ../ parent imports, Drizzle and the Postgres client confined to packages/db, brand strings and ports centralized in packages/constants, naming conventions across the tree, and consistent dependency versions across workspaces.

The CI pipeline definition lives in scripts/ci.ts, bun run ci and the GitHub job both consume it, so local and remote stay in sync.

API Documentation

The public REST API is hosted at api.qafiyah.com and ships with interactive documentation at api.qafiyah.com/v1/docs, generated from oRPC contracts via @orpc/openapi and rendered with Scalar. The root URL redirects to the docs.

Rate Limits and Terms of Use

The API is free, requires no authentication, and is provided on a best-effort basis with no SLA.

  • Fair use. Per-IP throttling is enforced at the server level. For bulk access, prefer the PostgreSQL dumps or the HuggingFace dataset over paginating the API.
  • Caching. Responses are cacheable; cache them client-side when possible to reduce load.
  • Stability. v1 endpoints are stable. Breaking changes ship behind a new major version.
  • Attribution. Not required, but appreciated, see Citation if you publish work that relies on the corpus.

Documentation

Roadmap

  • Semantic Search
  • Elasticsearch
  • Mobile app (React Native / Expo)
  • Internal dashboard for content management
  • Dark mode

Contributing

Contributions are welcome. Before opening a pull request, please read:

Acknowledgments

Listed chronologically by date of contribution:

  • Khalid Alraddady, AI Engineer at HRSD. Development of the semantic search feature currently under active development.
  • Khalid Almulaify, PhD in Morphology and Syntax at IMSIU. Ongoing financial sponsorship ($100/month) and sustained usage of the public API through a Telegram bot.
  • Malath Alsaif, Software Engineer at Ejari. UI improvements and implementation of the local database development workflow.
  • Fahad Alghamdi, Software Engineer at Thmanyah. Diagnosis of a redundant per-request SELECT 1 health check in the DB middleware.

Built with Qafiyah

Projects and tools that use the Qafiyah corpus or API:

Built something with Qafiyah? Open a PR to add it here.

Citation

If you use Qafiyah in academic work, please cite it as:

@misc{qafiyah2026,
  title  = {Qafiyah: An Open Corpus of Classical Arabic Poetry},
  author = {Alqahtani, Alwaleed},
  year   = {2026},
  url    = {https://qafiyah.com},
  note   = {Dataset: https://huggingface.co/datasets/qafiyah/classical-arabic-poetry}
}

Sponsor

If Qafiyah is useful to you or your work, you can support the project on GitHub Sponsors. Sponsorship funds dataset upkeep, API hosting, and ongoing maintenance.

License

Released under the MIT License.