Vartta Kosha

A production-oriented newspaper retrieval and PDF assembly platform focused on Indian newspaper editions from TradingRef live data.

The application allows a user to:

Select a date.
Select language, newspaper, and edition.
Generate a consolidated PDF from mixed source content (image pages, direct PDFs, and locked PDFs).
Download the generated result in-browser.

System Purpose

This project is designed to bridge a difficult source format into a reliable user download experience:

TradingRef edition payloads are obfuscated and heterogeneous.
Source content may be image sequences, normal PDFs, or locked PDFs.
The app normalizes this variability into one predictable output: a downloadable PDF.

Core goals:

Fast selection UX.
High success rate across multiple content modes.
Clear progress telemetry during long-running generation.
Defensive request handling and rate limits for server stability.

High-Level Architecture

Browser UI (Next.js Client Components)
				|
				| HTTP calls
				v
Next.js Route Handlers (Node runtime)
	- /api/data/[date]
	- /api/newspapers
	- /api/editions
	- /api/pdf (POST + progress GET)
				|
				| fetch + decode + merge
				v
TradingRef Live API (https://data.tradingref.com/YYYYMMDD.json)
				|
				| mixed assets (images/PDFs/locked PDFs)
				v
In-memory PDF assembly + optional Python unlock helper
				|
				v
Data URL PDF returned to browser for direct download

End-to-End User Flow

1) Date selection

UI calls GET /api/data/[date].

Server responsibilities:

Validate date format (YYYYMMDD).
Apply rate limiting.
Fetch live TradingRef JSON for date.
Extract normalized language list.

2) Language selection

UI calls GET /api/newspapers?date=...&language=....

Server responsibilities:

Validate query values.
Resolve normalized key back to source key (findMatchingKey).
Return newspapers for chosen language.

3) Newspaper selection

UI calls GET /api/editions?date=...&language=...&newspaper=....

Server responsibilities:

Validate query values.
Resolve source keys.
Return edition list with page counts.

4) Download trigger

UI calls POST /api/pdf with { date, language, newspaper, edition, requestId } and starts fast polling (GET /api/pdf?jobId=...).

Server responsibilities:

Validate payload and request size.
Create progress job in memory.
Decrypt selected edition metadata.
Build source URLs.
Route to the right generation path by type.
Return PDF as data:application/pdf;base64,... when complete.

PDF Generation Mechanism

The pipeline supports multiple source formats with fallback behavior.

Type handling

image: each page is downloaded and embedded as a PDF page.
pdf: source PDFs are appended page-by-page to merged output.
pdfl and dfl: password-aware merge path via Python helper.
Unknown or unsupported response: attempt image proxy conversion path.

Locked PDF path

For locked PDFs, the Node route calls a local Python helper:

Script: scripts/merge_locked_pdfs.py
Input: URL list + password map (filename first 10 chars heuristic)
Output: merged PDF base64 payload + failure list

If decryption path fails, the system attempts a last-resort conversion via proxy image flow.

Progress telemetry

A request-level job stores these stages:

validating
fetching
downloading
decrypting
merging
complete or error

The client polls this status every 300ms to render real-time logs and progress bars.

Data Model and Decryption

TradingRef responses use a 4-level hierarchy:

{
	"language": {
		"newspaper": {
			"edition": "obfuscated_payload"
		}
	}
}

Each edition payload is decoded by character translation against a reversed alphabet.

Decoded structure is split by delimiters:

q! separates type, prefix, pagesBlob.
m% separates page/file entries.

Resulting normalized object:

{
	"type": "image | pdf | pdfl | dfl | ...",
	"prefix": "https://...",
	"pages": ["file1", "file2"],
	"pages_count": 2,
	"raw_decoded": "..."
}

API Surface

`GET /api/data/[date]`

Returns available languages for a date.

`GET /api/newspapers`

Query: date, language.

Returns newspapers under a language.

`GET /api/editions`

Query: date, language, newspaper.

Returns edition list and page hints.

`POST /api/pdf`

Payload: date, language, newspaper, edition, optional requestId.

Returns final PDF data URL when successful.

`GET /api/pdf?jobId=...`

Returns in-flight progress snapshot.

Reliability and Safety Layers

Input validation

Date format checks.
Language/newspaper/edition sanitization.
Request body size constraints.

Traffic control

In-memory rate limiter presets:

strict for expensive routes.
standard for common lookups.
relaxed for high-frequency progress polling.

Network resilience

Timeout-bound fetch calls.
Retry with exponential backoff.
Circuit breaker to prevent cascading failures.

Failure behavior

If primary data assembly fails, route returns sanitized errors and preserves server-side diagnostic logs.

Frontend Interaction Model

Primary logic is orchestrated in use-newspaper hook:

Cascading state reset on every upstream selection change.
Request cancellation with AbortController to prevent stale updates.
Progress polling lifecycle with auto-stop on completion/error.
Download trigger using generated data URL.

UI components provide:

Neumorphic visual language.
Animated state transitions.
Stage-aware progress panel.
Download-ready confirmation state.

Language Coverage

The platform currently recognizes 14 language groups:

Bengali
Hindi
English
Gujarati
Marathi
Tamil
Telugu
Kannada
Malayalam
Punjabi
Odia
Urdu
Assamese
Konkani

Project Structure

vartta-kosha/
├─ src/
│  ├─ app/
│  │  ├─ api/
│  │  │  ├─ data/[date]/route.ts
│  │  │  ├─ newspapers/route.ts
│  │  │  ├─ editions/route.ts
│  │  │  └─ pdf/route.ts
│  │  └─ page.tsx
│  ├─ hooks/use-newspaper.ts
│  ├─ lib/
│  │  ├─ api/tradingref.ts
│  │  ├─ fetch-utils.ts
│  │  └─ rate-limit.ts
│  └─ components/
├─ scripts/
│  ├─ merge_locked_pdfs.py
│  ├─ colab/
│  └─ legacy-local/
└─ resources/tradingref-data/
	 ├─ downloads/
	 └─ json-snapshots/

Relocated Assets and Their Role

The following external project assets were analyzed and reorganized into this app repository.

1) JSON snapshots

New location: resources/tradingref-data/json-snapshots/

Files:

20260330.json
20260330.decrypted.json
20260330.resolved.json

Purpose:

Raw source sample (.json).
Decoded intermediate (.decrypted.json).
Fully resolved URL-ready reference (.resolved.json).

These files document the full transformation chain from obfuscated payload to direct asset URLs.

2) Download workspace

New location: resources/tradingref-data/downloads/

Purpose:

Local workspace for captured/downloaded artifacts.
Useful for debugging payload quality and merge outcomes.

3) Colab automation scripts

New location: scripts/colab/

Files:

colab_date_discovery.py
colab_mass_archiver_v4.py

Purpose:

Discover earliest valid TradingRef date.
Perform large-scale date archiving and index generation with batch uploads.

4) Local visual diagnostic server

New location: scripts/legacy-local/tradingref_visual_server.py

Purpose:

Standalone local workflow tester for TradingRef payload inspection.
Interactive selection UI for data traversal and direct/proxy URL checks.
Optional local asset fetch and locked-PDF hinting.

Local Development

Prerequisites

Node.js 20+ recommended.
pnpm.

Install and run

pnpm install
pnpm dev

Open http://localhost:3000.

Build and start

pnpm build
pnpm start

Operational Notes

Cloud Run dependency for locked PDF merge

Locked PDFs are decrypted through an external Cloud Run service.

Set the following server-side environment variables:

LOCKED_PDF_DECRYPT_URL
LOCKED_PDF_DECRYPT_TOKEN (optional but recommended)

Runtime assumptions

Cloud Run decrypt service is reachable from the Next.js runtime.
External endpoints are reachable:
1. https://data.tradingref.com
2. https://images.weserv.nl

Security note

Current rate limiting is in-memory. For multi-instance deployments, migrate to a centralized store-backed limiter.

This README is intentionally implementation-centric, so maintainers can reason about behavior, failure modes, and extension points without needing to reverse engineer route logic first.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
cloud-run/locked-pdf-decrypt		cloud-run/locked-pdf-decrypt
public		public
resources		resources
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vercel.json		vercel.json

Folders and files

Latest commit

History

Repository files navigation

Vartta Kosha

Table of Contents

System Purpose

High-Level Architecture

End-to-End User Flow

1) Date selection

2) Language selection

3) Newspaper selection

4) Download trigger

PDF Generation Mechanism

Type handling

Locked PDF path

Progress telemetry

Data Model and Decryption

API Surface

GET /api/data/[date]

GET /api/newspapers

GET /api/editions

POST /api/pdf

GET /api/pdf?jobId=...

Reliability and Safety Layers

Input validation

Traffic control

Network resilience

Failure behavior

Frontend Interaction Model

Language Coverage

Project Structure

Relocated Assets and Their Role

1) JSON snapshots

2) Download workspace

3) Colab automation scripts

4) Local visual diagnostic server

Local Development

Prerequisites

Install and run

Build and start

Operational Notes

Cloud Run dependency for locked PDF merge

Runtime assumptions

Security note

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

`GET /api/data/[date]`

`GET /api/newspapers`

`GET /api/editions`

`POST /api/pdf`

`GET /api/pdf?jobId=...`