Skip to content

RomiconEZ/KONSPECTO-LLM

Repository files navigation

🎓 KONSPECTO - LLM Agent for Note Management

👥 Authors

  • Neronov Roman
  • Fazlyev Albert

📋 Project Description

KONSPECTO is an intelligent agent based on a local LLM model, offering the following capabilities:

🔍 Search Through Notes

  • Semantic search across the notes database
  • Generation of structured responses based on the retrieved information
  • Ability to view original documents

🎥 Video Processing

  • Extraction of keyframes from YouTube videos
  • Creation of DOCX documents with images
  • Filtering of similar frames

🎤 Voice Input

  • Transcription of voice messages using Whisper
  • Support for the Russian language
  • Ability to combine voice and text input

📽️ Presentation

Presentation KONSPECTO

🛠 Tech Stack

Frontend

  • ⚛️ React + Vite
  • 🎨 TailwindCSS
  • 🔄 React Router
  • ✨ React Icons

Backend

  • 🚀 FastAPI
  • 🤖 LangChain
  • 🔍 LlamaIndex
  • 📝 Whisper
  • 🎥 OpenCV
  • 🗄️ Redis Stack

📦 Installation

Prerequisites

  • Docker and Docker Compose
  • Node.js 18+
  • Python 3.11+
  • Poetry
  • pre-commit
  • LM Studio - Download from https://lmstudio.ai

LM Studio Setup

  1. Download and install LM Studio from the official website
  2. In LM Studio:
    • Go to "Search" tab
    • Find and download IlyaGusev/saiga_nemo_12b_gguf/saiga_nemo_12b.Q8_0.gguf model
    • Go to "Local Server" tab
    • Select the downloaded model from the dropdown menu
    • Start the server (it will run on http://localhost:1234/v1)
    • Keep the server running while using KONSPECTO

⚠️ Note: Make sure the LM Studio server is running before starting the application, as KONSPECTO relies on it for text generation.

1️⃣ Clone the Repository

git clone https://github.com/RomiconEZ/KONSPECTO
cd KONSPECTO

2️⃣ Configure Settings

Create configuration files in the backend/app/config/ directory:

.env

FOLDER_ID=your_google_drive_folder_id
GOOGLE_SERVICE_ACCOUNT_KEY_PATH=config/service_account_key.json

TRANSCRIPTION_MODEL=whisper
WHISPER_MODEL_SIZE=large-v3

LLM_STUDIO_BASE_URL=http://localhost:1234/v1

EMBEDDING_MODEL_NAME="intfloat/multilingual-e5-large"
EMBEDDING_BATCH_SIZE=16
EMBEDDING_DIMENSION=1024

service_account_key.json

{
  // Your Google service account credentials
  // Obtain them from the Google Cloud Console
}

3️⃣ Install Dependencies

Frontend:

cd frontend
npm install

Backend:

cd backend
poetry install

4️⃣ Set Up pre-commit Hooks

pre-commit install --install-hooks
pre-commit run --all-files

5️⃣ Run Tests

Frontend tests:

cd frontend
npm run test

Backend tests:

cd backend
bash tests/run_tests.sh

6️⃣ Launch the Application

docker compose up --build

The application will be available at the following addresses:

🔄 Workflow

  1. Information Search

    • The user sends a request through the UI
    • The agent analyzes the request and determines the necessary tools
    • A search is performed across the knowledge base and a response is generated
  2. Video Processing

    • Uploading a YouTube video
    • Extracting frames every 5 seconds
    • Filtering similar images
    • Creating a DOCX document
  3. Voice Input

    • Recording audio via the browser
    • Transcription using Whisper
    • Adding the text to the current query

✅ Validation

It is not possible to produce a deterministic assessment of the agent’s performance because its effectiveness depends on the unique data serving as its knowledge base. In our case, this knowledge base consists of user-generated notes, which are different for every individual. Consequently, any quality measurement will vary significantly from one user’s environment to another.

In this project, we tested the agent on two specific documents: one explaining gradient descent and another explaining stochastic gradient descent. The system demonstrated consistent accuracy in retrieving relevant information from these documents during the queries shown in the demo video. However, because user notes can differ in style, depth, and content, the same agent might show varied results when applied to an entirely different set of documents.

This inherent reliance on specialized, user-specific data makes it impossible to generalize the agent’s quality or establish a uniform benchmark. The system’s performance is inseparable from the nuances of the data it is provided with, preventing any deterministic evaluation of its capabilities.

📜 License

Apache License

⭐️ Support the Project

If you like the project, give it a star on GitHub!

About

LLM agent that provides tools for convenient work with personal documents using voice or text.

Topics

Resources

License

Stars

Watchers

Forks