Welcome to my portfolio of Artificial Intelligence tools and prototypes. This repository contains a collection of applications developed during my studies and as personal projects driven by my interest in generative AI and software development.
The projects range from command-line productivity utilities to full-stack web applications, demonstrating practical implementations of Large Language Models (LLMs), Computer Vision, and Audio Processing.
Tech Stack: Python Streamlit LangChain OpenAI FAISS
A modern web application that solves the problem of information overload. It allows users to create a temporary knowledge base from multiple PDF documents and websites and interact with them using a chat interface.
- Key Feature: Retrieval-Augmented Generation (RAG) to ground AI answers in factual data.
- Use Case: Rapidly analyzing manuals, contracts, or long reports without reading every page.
Here is an overview of the tools included in this repository:
| Project | Type | Description | Key Tech |
|---|---|---|---|
| Scientific Article Generator | CLI Tool | Automates the creation of structured academic papers (Abstract to References) on any given topic. Outputs both Markdown and PDF. | Google Gemini xhtml2pdf |
| Multi-Modal Product Analyzer | CLI Tool | Takes product images as input and generates professional marketing descriptions and slogans. | Google Gemini Vision Argparse |
| Real-Time Voice Interpreter | Script | A "Universal Translator" that listens to speech, translates it, and speaks it back in the target language. | Whisper GPT-4 OpenAI TTS |
| Voice-to-Image Artist | Script | Converts spoken descriptions directly into digital art by chaining speech recognition with image generation. | Whisper DALL-E 3 |
| LLM CLI Utility | CLI Tool | A "Swiss Army Knife" for developers to process various file formats (PDF, CSV, HTML) and URLs via command line using AI. | MarkItDown OpenAI |
This portfolio demonstrates proficiency in the following areas:
- Generative AI: OpenAI API (GPT-4, DALL-E, Whisper), Google Gemini API.
- Frameworks: LangChain (Chains, Agents, Vector Stores), Streamlit.
- Data Engineering: RAG Architectures, Vector Embeddings (FAISS), Web Scraping.
- Python Development: API integration, CLI tool creation, Audio processing (PyAudio), PDF manipulation.
Most tools in this repository require API keys (OpenAI or Google Cloud).
-
Clone the repository:
-
Install dependencies:
Each project has specific requirements. Generally, you can install the core libraries with:
pip install -r requirements.txt -
Environment Setup:
Ensure you have your API keys set as environment variables:- OPENAI_API_KEY
- GOOGLE_API_KEY (for Gemini projects)