InsightExtractor

Research-Powered Prompt Engineering Assistant

InsightExtractor is an advanced system that extracts insights from research papers on prompt engineering, LLMs, and AI to generate optimized prompts based on cutting-edge academic findings. It processes academic PDFs, extracts key methodologies and findings, and uses this knowledge to craft research-backed prompts for specific use cases.

🌟 Features

Research Paper Processing: Extract text from PDF research papers and split into optimal chunks
AI-Powered Insight Extraction: Analyze research papers to extract key concepts, methodologies, and findings
Knowledge Base Creation: Build a searchable vector database of research insights
Optimized Prompt Generation: Generate tailored prompts based on research findings for any specific goal
Extensible Architecture: Easily add new papers to expand the knowledge base over time

📋 Requirements

Python 3.8+
Google Gemini API key (or OpenAI API key with minor modifications)
PDF research papers (not included in the repository)

🚀 Installation

# Clone the repository
git clone https://github.com/W3STY11/InsightExtractor.git
cd InsightExtractor

# Create a virtual environment
python -m venv venv

# Activate the virtual environment
# For Windows:
venv\Scripts\activate
# For macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Create necessary directories
mkdir -p data/papers knowledge_db

🔑 API Key Setup

Before running the system, you need to set up your Google Gemini API key:

# Option 1: Create a .api_key file
echo "YOUR_GEMINI_API_KEY" > .api_key

Or set it as an environment variable:

# For Windows (PowerShell)
$env:GOOGLE_API_KEY = "YOUR_GEMINI_API_KEY"

# For macOS/Linux
export GOOGLE_API_KEY="YOUR_GEMINI_API_KEY"

📖 Usage

Starting the Application

python -m src.main

Processing Research Papers

Select option 1 from the main menu
Enter the full path to a research paper PDF
Wait for the system to process the paper and extract insights
The paper will be added to your knowledge base

Generating Optimized Prompts

Select option 2 from the main menu
Enter your prompt goal (e.g., "Generate a creative story")
Provide context about your use case (e.g., "For middle school students")
The system will generate a research-backed prompt optimized for your goal

📁 Repository Structure

InsightExtractor/
├── data/
│   └── papers/             # Directory for storing research papers
├── knowledge_db/           # Vector database storage for extracted insights
├── src/
│   ├── __init__.py
│   ├── document_processor.py  # PDF loading and chunking
│   ├── knowledge_extractor.py # Research insight extraction
│   ├── prompt_generator.py    # Optimized prompt generation
│   └── main.py                # Main application
├── scripts/
│   └── api_key_test.py        # Script to test API key
├── requirements.txt           # Project dependencies
├── .gitignore                 # Git ignore file
└── README.md                  # This README file

🧠 How It Works

InsightExtractor follows a four-stage process:

Document Processing: PDFs are converted to text and split into manageable chunks with appropriate overlap to maintain context
Insight Extraction: Each chunk is analyzed using the Gemini API to extract key concepts, methodologies, findings, and applications from the research papers
Knowledge Base Creation: Extracted insights are stored in a vector database (Chroma) using embeddings that capture the semantic meaning of the content
Prompt Generation: When given a prompt goal and context, the system searches the knowledge base for relevant research insights and uses them to generate an optimized, research-backed prompt

📊 Performance

The system has been tested with a variety of research papers and consistently produces high-quality, research-informed prompts.
Processing time varies based on document length:

Document Size	Processing Time	Chunks	Storage Size
10 pages	~5-10 minutes	~30	~5 MB
25 pages	~15-25 minutes	~75	~12 MB
50+ pages	~30-60 minutes	150+	~25+ MB

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

InsightExtractor

Research-Powered Prompt Engineering Assistant

🌟 Features

📋 Requirements

🚀 Installation

🔑 API Key Setup

📖 Usage

Starting the Application

Processing Research Papers

Generating Optimized Prompts

📁 Repository Structure

🧠 How It Works

📊 Performance

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data/papers		data/papers
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

W3STY11/InsightExtractor

Folders and files

Latest commit

History

Repository files navigation

InsightExtractor

Research-Powered Prompt Engineering Assistant

🌟 Features

📋 Requirements

🚀 Installation

🔑 API Key Setup

📖 Usage

Starting the Application

Processing Research Papers

Generating Optimized Prompts

📁 Repository Structure

🧠 How It Works

📊 Performance

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages