A production-oriented newspaper retrieval and PDF assembly platform focused on Indian newspaper editions from TradingRef live data.
The application allows a user to:
- Select a date.
- Select language, newspaper, and edition.
- Generate a consolidated PDF from mixed source content (image pages, direct PDFs, and locked PDFs).
- Download the generated result in-browser.
- System Purpose
- High-Level Architecture
- End-to-End User Flow
- PDF Generation Mechanism
- Data Model and Decryption
- API Surface
- Reliability and Safety Layers
- Frontend Interaction Model
- Language Coverage
- Project Structure
- Relocated Assets and Their Role
- Local Development
- Operational Notes
This project is designed to bridge a difficult source format into a reliable user download experience:
- TradingRef edition payloads are obfuscated and heterogeneous.
- Source content may be image sequences, normal PDFs, or locked PDFs.
- The app normalizes this variability into one predictable output: a downloadable PDF.
Core goals:
- Fast selection UX.
- High success rate across multiple content modes.
- Clear progress telemetry during long-running generation.
- Defensive request handling and rate limits for server stability.
Browser UI (Next.js Client Components)
|
| HTTP calls
v
Next.js Route Handlers (Node runtime)
- /api/data/[date]
- /api/newspapers
- /api/editions
- /api/pdf (POST + progress GET)
|
| fetch + decode + merge
v
TradingRef Live API (https://data.tradingref.com/YYYYMMDD.json)
|
| mixed assets (images/PDFs/locked PDFs)
v
In-memory PDF assembly + optional Python unlock helper
|
v
Data URL PDF returned to browser for direct download
UI calls GET /api/data/[date].
Server responsibilities:
- Validate date format (
YYYYMMDD). - Apply rate limiting.
- Fetch live TradingRef JSON for date.
- Extract normalized language list.
UI calls GET /api/newspapers?date=...&language=....
Server responsibilities:
- Validate query values.
- Resolve normalized key back to source key (
findMatchingKey). - Return newspapers for chosen language.
UI calls GET /api/editions?date=...&language=...&newspaper=....
Server responsibilities:
- Validate query values.
- Resolve source keys.
- Return edition list with page counts.
UI calls POST /api/pdf with { date, language, newspaper, edition, requestId } and starts fast polling (GET /api/pdf?jobId=...).
Server responsibilities:
- Validate payload and request size.
- Create progress job in memory.
- Decrypt selected edition metadata.
- Build source URLs.
- Route to the right generation path by type.
- Return PDF as
data:application/pdf;base64,...when complete.
The pipeline supports multiple source formats with fallback behavior.
image: each page is downloaded and embedded as a PDF page.pdf: source PDFs are appended page-by-page to merged output.pdflanddfl: password-aware merge path via Python helper.- Unknown or unsupported response: attempt image proxy conversion path.
For locked PDFs, the Node route calls a local Python helper:
- Script:
scripts/merge_locked_pdfs.py - Input: URL list + password map (filename first 10 chars heuristic)
- Output: merged PDF base64 payload + failure list
If decryption path fails, the system attempts a last-resort conversion via proxy image flow.
A request-level job stores these stages:
validatingfetchingdownloadingdecryptingmergingcompleteorerror
The client polls this status every 300ms to render real-time logs and progress bars.
TradingRef responses use a 4-level hierarchy:
{
"language": {
"newspaper": {
"edition": "obfuscated_payload"
}
}
}Each edition payload is decoded by character translation against a reversed alphabet.
Decoded structure is split by delimiters:
q!separatestype,prefix,pagesBlob.m%separates page/file entries.
Resulting normalized object:
{
"type": "image | pdf | pdfl | dfl | ...",
"prefix": "https://...",
"pages": ["file1", "file2"],
"pages_count": 2,
"raw_decoded": "..."
}Returns available languages for a date.
Query: date, language.
Returns newspapers under a language.
Query: date, language, newspaper.
Returns edition list and page hints.
Payload: date, language, newspaper, edition, optional requestId.
Returns final PDF data URL when successful.
Returns in-flight progress snapshot.
- Date format checks.
- Language/newspaper/edition sanitization.
- Request body size constraints.
In-memory rate limiter presets:
strictfor expensive routes.standardfor common lookups.relaxedfor high-frequency progress polling.
- Timeout-bound fetch calls.
- Retry with exponential backoff.
- Circuit breaker to prevent cascading failures.
If primary data assembly fails, route returns sanitized errors and preserves server-side diagnostic logs.
Primary logic is orchestrated in use-newspaper hook:
- Cascading state reset on every upstream selection change.
- Request cancellation with
AbortControllerto prevent stale updates. - Progress polling lifecycle with auto-stop on completion/error.
- Download trigger using generated data URL.
UI components provide:
- Neumorphic visual language.
- Animated state transitions.
- Stage-aware progress panel.
- Download-ready confirmation state.
The platform currently recognizes 14 language groups:
- Bengali
- Hindi
- English
- Gujarati
- Marathi
- Tamil
- Telugu
- Kannada
- Malayalam
- Punjabi
- Odia
- Urdu
- Assamese
- Konkani
vartta-kosha/
├─ src/
│ ├─ app/
│ │ ├─ api/
│ │ │ ├─ data/[date]/route.ts
│ │ │ ├─ newspapers/route.ts
│ │ │ ├─ editions/route.ts
│ │ │ └─ pdf/route.ts
│ │ └─ page.tsx
│ ├─ hooks/use-newspaper.ts
│ ├─ lib/
│ │ ├─ api/tradingref.ts
│ │ ├─ fetch-utils.ts
│ │ └─ rate-limit.ts
│ └─ components/
├─ scripts/
│ ├─ merge_locked_pdfs.py
│ ├─ colab/
│ └─ legacy-local/
└─ resources/tradingref-data/
├─ downloads/
└─ json-snapshots/
The following external project assets were analyzed and reorganized into this app repository.
New location: resources/tradingref-data/json-snapshots/
Files:
20260330.json20260330.decrypted.json20260330.resolved.json
Purpose:
- Raw source sample (
.json). - Decoded intermediate (
.decrypted.json). - Fully resolved URL-ready reference (
.resolved.json).
These files document the full transformation chain from obfuscated payload to direct asset URLs.
New location: resources/tradingref-data/downloads/
Purpose:
- Local workspace for captured/downloaded artifacts.
- Useful for debugging payload quality and merge outcomes.
New location: scripts/colab/
Files:
colab_date_discovery.pycolab_mass_archiver_v4.py
Purpose:
- Discover earliest valid TradingRef date.
- Perform large-scale date archiving and index generation with batch uploads.
New location: scripts/legacy-local/tradingref_visual_server.py
Purpose:
- Standalone local workflow tester for TradingRef payload inspection.
- Interactive selection UI for data traversal and direct/proxy URL checks.
- Optional local asset fetch and locked-PDF hinting.
- Node.js 20+ recommended.
- pnpm.
pnpm install
pnpm devOpen http://localhost:3000.
pnpm build
pnpm startLocked PDFs are decrypted through an external Cloud Run service.
Set the following server-side environment variables:
LOCKED_PDF_DECRYPT_URLLOCKED_PDF_DECRYPT_TOKEN(optional but recommended)
- Cloud Run decrypt service is reachable from the Next.js runtime.
- External endpoints are reachable:
https://data.tradingref.comhttps://images.weserv.nl
Current rate limiting is in-memory. For multi-instance deployments, migrate to a centralized store-backed limiter.
This README is intentionally implementation-centric, so maintainers can reason about behavior, failure modes, and extension points without needing to reverse engineer route logic first.