Skip to content

ThuanNaN/aio2024-llmops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AIVN AIO 2024 LLM Ops

A comprehensive MLOps solution for deploying, serving, and monitoring fine-tuned LLMs.

Project Overview

This project demonstrates an end-to-end LLM deployment with a focus on:

  • Serving fine-tuned Llama 3.2 1B models with vLLM
  • Backend API with LangChain for structured LLM interactions
  • Frontend interface with Gradio
  • Comprehensive monitoring with Prometheus and Grafana
  • Log aggregation with Loki

Components

  • vLLM API: Serves the Llama 3.2 1B base model and custom LoRA adapters
  • Backend: FastAPI service that handles prompting, model selection, and response formatting
  • Frontend: Gradio web interface for easy interaction with the models
  • Monitoring: Prometheus, Grafana, and Loki for observability

Use Cases

  1. Sentiment Analysis: Analyzes text sentiment using a fine-tuned model
  2. Medical QA: Answers medical multiple-choice questions with domain-specific tuning

Getting Started

  1. Set up the network:

    docker network create aio-network
  2. Start the monitoring stack:

    cd monitor
    docker compose up -d
  3. Launch the vLLM API server:

    cd vllm_api
    docker compose up -d
  4. Start the backend API:

    cd backend
    docker compose up -d
  5. Launch the frontend application:

    cd frontend
    docker compose up -d

Accessing Services

  • vLLM API: http://localhost:8000
  • Backend API: http://localhost:8001
  • Gradio UI: http://localhost:7861
  • Open WebUI: http://localhost:8080
  • Grafana: http://localhost:3000
  • Prometheus: http://localhost:9090

Benchmark

export OPENAI_API_KEY=<your vllm api key>
make bench_serving

About

Multi-LoRA Adapter Serving with LLaMA-3.2-1B-Instruct

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published