pdf2latex with VLMs

This repository contains a complete MLOps pipeline for training and serving a PDF-to-LaTeX model on Google Cloud Platform (GCP) using Vertex AI.

🛠️ Local Development Setup

CUDA (Linux/Windows)

conda create -n pdf2latex python=3.11 -y
conda activate pdf2latex
pip install notebook tqdm wandb
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu129
pip install transformers datasets accelerate peft flash-attn --no-build-isolation

MPS (Mac)

uv sync

Note: flash-attn is not available for MPS.

Serving

ssh -L 8001:gpunode24:8000 cs.edu

☁️ GCP MLOps Pipeline

Prerequisites

GCP Project: A Google Cloud Project with billing enabled.
Tools: Install Terraform, Google Cloud SDK, and uv.

Authentication:

gcloud auth login
gcloud auth application-default login

1. Infrastructure Setup (Terraform)

Provision all necessary resources (GCS Bucket, Artifact Registry, APIs) automatically.

cd terraform
# Update terraform.tfvars with your project_id
terraform init
terraform apply

Note down the Bucket Name output by Terraform.

2. Dataset & Model Staging

Generate the dataset and stage the model artifacts to GCS.

Generate Dataset:

uv run python pdf2latex/data_process.py
# Upload to GCS
gcloud storage cp datasets/latex80m_en_1m.parquet gs://YOUR_BUCKET/datasets/

Stage Model (Hugging Face -> GCS): Download the model and upload it to your bucket for controlled serving.

uv run python scripts/stage_model.py \
    --repo_id scottcfy/Qwen2-VL-2B-Instruct-pdf2latex \
    --gcs_uri gs://YOUR_BUCKET/models/pdf2latex-v1 \
    --project_id YOUR_PROJECT_ID

3. Build & Push Docker Images

Build the training and serving containers and push them to Artifact Registry.

# Usage: ./scripts/gcp_build_and_push.sh <PROJECT_ID> <REGION> <REPO_NAME>
./scripts/gcp_build_and_push.sh YOUR_PROJECT_ID us-central1 pdf2latex-repo

4. Training (Optional)

Submit a custom training job to Vertex AI.

uv run python scripts/gcp_submit_train.py \
    --project_id YOUR_PROJECT_ID \
    --location us-central1 \
    --staging_bucket gs://YOUR_BUCKET \
    --display_name pdf2latex-train \
    --container_uri us-central1-docker.pkg.dev/YOUR_PROJECT_ID/pdf2latex-repo/pdf2latex-train:latest \
    --dataset_path gs://YOUR_BUCKET/datasets/latex80m_en_1m.parquet \
    --output_dir gs://YOUR_BUCKET/outputs/run1 \
    --use_spot  # Use Spot instances for cost savings

5. Serving / Deployment

Deploy the model to a Vertex AI Endpoint. The serving container supports loading from GCS or Hugging Face.

Deploy from GCS (Recommended):

uv run python scripts/gcp_deploy_serve.py \
    --project_id YOUR_PROJECT_ID \
    --location us-central1 \
    --display_name pdf2latex-serve \
    --serving_container_uri us-central1-docker.pkg.dev/YOUR_PROJECT_ID/pdf2latex-repo/pdf2latex-serve:latest \
    --model_artifact_uri gs://YOUR_BUCKET/models/pdf2latex-v1

Deploy from Hugging Face directly:

uv run python scripts/gcp_deploy_serve.py \
    ...
    --hf_model_id scottcfy/Qwen2-VL-2B-Instruct-pdf2latex

6. Testing

Verify the deployed endpoint by sending a sample image.

uv run python scripts/test_endpoint.py \
    --endpoint_id YOUR_ENDPOINT_ID \
    --image_path test_image.png

🔌 Integration Guide

To call the deployed model from another service (e.g., a backend API or microservice), use the Google Cloud Vertex AI SDK or standard REST API.

Authentication

Ensure your service has a Service Account with the Vertex AI User role.

Local Dev: gcloud auth application-default login
Production: Attach the Service Account to your VM/Pod.

Python Example

import base64
import json
from google.cloud import aiplatform

def predict_latex(project_id, location, endpoint_id, image_path):
    # Initialize Vertex AI SDK
    aiplatform.init(project=project_id, location=location)
    endpoint = aiplatform.Endpoint(endpoint_id)

    # Encode Image
    with open(image_path, "rb") as f:
        encoded_image = base64.b64encode(f.read()).decode("utf-8")

    # Construct Payload (OpenAI Chat Format)
    payload = {
        "model": "/model-artifacts",
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Convert this to LaTeX."},
                    {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{encoded_image}"}}
                ]
            }
        ],
        "max_tokens": 512,
        "temperature": 0.2
    }

    # Send Request
    response = endpoint.raw_predict(
        body=json.dumps(payload).encode("utf-8"),
        headers={"Content-Type": "application/json"}
    )
    
    return response.content.decode("utf-8")
ssh -L 8001:gpunode3:8000 cs.edu

vllm serve Qwen/Qwen2-VL-2B-Instruct \
    --port 8000 \
    --gpu-memory-utilization 0.9

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
pdf2latex		pdf2latex
scripts		scripts
terraform		terraform
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Dockerfile.serve		Dockerfile.serve
Dockerfile.train		Dockerfile.train
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
scratch.ipynb		scratch.ipynb
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pdf2latex with VLMs

🛠️ Local Development Setup

CUDA (Linux/Windows)

MPS (Mac)

Serving

☁️ GCP MLOps Pipeline

Prerequisites

1. Infrastructure Setup (Terraform)

2. Dataset & Model Staging

3. Build & Push Docker Images

4. Training (Optional)

5. Serving / Deployment

6. Testing

🔌 Integration Guide

Authentication

Python Example

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

ScottCTD/img2latex_vlm

Folders and files

Latest commit

History

Repository files navigation

pdf2latex with VLMs

🛠️ Local Development Setup

CUDA (Linux/Windows)

MPS (Mac)

Serving

☁️ GCP MLOps Pipeline

Prerequisites

1. Infrastructure Setup (Terraform)

2. Dataset & Model Staging

3. Build & Push Docker Images

4. Training (Optional)

5. Serving / Deployment

6. Testing

🔌 Integration Guide

Authentication

Python Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages