iVA is a real-time web application that processes a live camera feed to perform object detection, text extraction (OCR), and AI-driven contextual analysis. It features a dual-model system, allowing users to switch between general object detection and specific, text-based object searching.
- Dual Detection Modes:
- YOLOv8: For high-performance, general-purpose object detection (80 classes).
- Grounding DINO (Placeholder): For specific, "open-vocabulary" detection based on a user's text prompt (e.g., "a person wearing a hat").
- Real-time Bounding Boxes: Draws boxes and labels around detected objects directly on the video feed.
- Text Extraction (OCR): Uses Tesseract to read text visible in the video stream.
- AI Scene Description: Leverages the Google Gemini API to generate intelligent descriptions for important scenes.
- Optimized Asynchronous Logging: A background worker intelligently buffers analysis results to a temporary log file, then selects and enriches only the most "reliable" log from each time window to save to a SQL Server database, minimizing API costs and database load.
- Interactive UI: A clean interface with controls to pause/play the video feed and switch between detection modes.
- Backend:
- ASP.NET Core (.NET 9)
- C#
- Entity Framework Core
- SQL Server
Microsoft.ML.OnnxRuntime(for YOLOv8 inference)Google.Ai.Generativelanguage(Official Gemini SDK)Tesseract.NET(for OCR)
- Frontend:
- HTML5
- Tailwind CSS
- Vanilla JavaScript
- .NET 9 SDK
- SQL Server (e.g., Express or Developer edition)
You must place the following files and folders in the root directory of the C# project:
yolov8n.onnx: The YOLOv8 model file.tessdatafolder: This folder must contain the Tesseract language data.- Create a folder named
tessdata. - Download
eng.traineddatafrom here and place it inside thetessdatafolder.
- Create a folder named
(The .csproj file is already configured to copy these files to the output directory when you build the project.)
- Open the
appsettings.jsonfile. - Update the
ConnectionStrings.DefaultConnectionvalue to point to your SQL Server instance. - Update the
Gemini.ApiKeyvalue with your Google Gemini API key. - Open a terminal in the project root and run the database migration to create the necessary tables:
dotnet ef database update
- Open a terminal in the project's root directory.
- Run the application using the .NET CLI:
dotnet run
- The terminal will display the URL the application is running on (e.g.,
https://localhost:7123). Open this URL in your web browser.
- When the page loads, your browser will ask for permission to use your camera. Click Allow.
- The application will start in the default "General Detection (YOLO)" mode, automatically identifying and boxing common objects.
- To search for a specific object, select the "Specific Search (DINO)" radio button and type a description (e.g., "a blue cup") into the text box.
- Use the Pause and Play buttons to control the video feed and the analysis process.
- The analytics panel on the right will update with the results from the live analysis. The background worker will save the most relevant logs to the database every 10 seconds.