Skip to content

ishaheen10/psxgpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

psxGPT

A ChatGPT-like application for Pakistan Stock Exchange (PSX) financial data, which can be repurposed for any financial data including querying a data room during acquisition due diligence.

What psxGPT Does

psxGPT allows financial analysts to query and analyze financial data using plain English. For example:

"Get me Deposits per branch for HBL, UBL and MEBL in 2024"

psxGPT will find the necessary information and compile a report using ONLY data found in financial reports.

Use Cases:

  • Financial statement analysis and comparison
  • Due diligence data room queries
  • Regulatory filing research
  • Querying Economic Survey of Pakistan or SBP Annual Report

Quick Start with GitHub Codespaces (Recommended)

Get psxGPT running in under 5 minutes using GitHub Codespaces:

Step 1: Launch GitHub Codespaces

  1. Click the green "Code" button on the psxGPT GitHub repository
  2. Select "Codespaces" tab and click "Create codespace on master"
  3. Wait for the environment to load (this may take a few minutes)

Step 2: Download and Upload Data

  1. Download the pre-built search index: gemini_index_metadata.zip
  2. Unzip the file on your computer - you should see a folder called gemini_index_metadata
  3. Upload to Codespaces: Drag and drop the entire gemini_index_metadata folder into your Codespaces file explorer (same level as the Python files)

Step 3: Configure Environment

  1. Rename the environment file:
    mv .env.example .env
  2. Get your Gemini API key:
    • Go to Google AI Studio
    • Sign in with your Google account and create a new API key
  3. Edit the .env file:
    • Open .env in the Codespaces editor
    • Add your Gemini API key: GEMINI_API_KEY=your_api_key_here
    • Delete the entire DATABASE_URL line (not needed for quick start, eliminates chat history though)
    • Save the file

Step 4: Install Dependencies

pip install uv
uv sync

Step 5: Activate Virtual Environment

source .venv/bin/activate

Step 6: Start the MCP Server

python Step7MCPServerPsxGPT.py

Important: Once the server starts, you'll see a "Ports" tab at the bottom of Codespaces. Click on it and change its visibility from "Private" to "Public" - this is essential for the connection to work.

Step 7: Start the Client (New Terminal)

  1. Open a new terminal: Click the + button in the terminal area
  2. Activate virtual environment again:
    source .venv/bin/activate
  3. Start the client:
    chainlit run Step8MCPClientGemini.py
  4. Access the application: Click on the port 8000 URL that appears in the Ports tab
  5. Login: Use analyst@psx.com / analyst123
  6. Connect to MCP Server: Follow the instructions in the interface to connect to your running MCP server

🎉 You're ready! Try asking: "Get me Deposits per branch for HBL, UBL and MEBL in 2024"


Full Setup Guide (Local Installation)

For users who prefer to install psxGPT on their local machine, follow the detailed instructions below.

How It Works

psxGPT processes financial documents through an 8-step pipeline:

  1. Download PDFs (Step1DownloadPDFsSearch.py or Step1DownloadPDFsTickers.py)
  2. Convert to Markdown (Step2ConvertPDFtoMarkdown.py) - Uses LlamaParse or Tool1Mistral_OCR.py for scanned documents
  3. Create Chunks (Step3ChunkMarkdown.py) - Splits into searchable segments
  4. Extract Metadata (Step4MetaDataTags.py) - Identifies companies, dates, report types
  5. Combine Data (Step5CombineMetaData.py) - Consolidates all metadata
  6. Build Search Index (Step6CreateEmbeddings.py) - Creates vector embeddings for AI search
  7. Start Backend Server (Step7MCPServerPsxGPT.py) - Launches the data query server
  8. Launch Web Interface (Step8MCPClientPsxGPT.py for using with Anthropic API or Step8MCPClientGemini.py for using with Gemini API (free tier)) - Starts the user-friendly chat interface

Quality Assurance: Use Tool2ValidateProcessing.py to verify data quality after processing.

How to Get Started

Overview

This guide walks you through setting up psxGPT step-by-step. The process takes about 30-45 minutes and involves:

  1. Installing required software (Git, Python, PostgreSQL, code editor)
  2. Downloading the project from GitHub
  3. Getting API keys for AI services
  4. Configuring the environment and installing dependencies
  5. Setting up the database and starting the application

Prerequisites

Before starting, you'll need:

  • A computer with Windows, macOS, or Linux
  • Internet connection for downloads
  • API keys for AI services (some free, some require credit card)

Step 1: Install Code Editor (IDE)

Choose one of these beginner-friendly code editors:

  • Windsurf (Recommended - AI-powered coding assistant)
  • Cursor (AI-powered VS Code alternative)
  • VS Code (Popular free editor)

Download and install your chosen editor following the installer instructions.

Step 2: Install Git

For Windows Users:

  1. Check Your PC Type First:

    • Press Windows key + R to open the Run dialog
    • Type msinfo32 and press Enter
    • Look for "System Type" in the System Information window:
      • x64-based PC = Intel/AMD processor (most common)
      • ARM64-based PC = ARM processor (newer Surface devices, some laptops)
  2. Go to https://git-scm.com/download/win

  3. Download the correct installer for your system:

    • 64-bit Git for Windows Setup (for x64-based PC)
    • ARM64 Git for Windows Setup (for ARM64-based PC)
  4. Run the installer and follow the setup wizard

  5. Installation tip: The installer will ask many questions about editors, line endings, etc. For this project, these choices don't matter - simply keep clicking "Next" with the default options

  6. When installation is complete, you can access Git through "Git Bash" or Command Prompt

For Mac Users:

  1. Install Homebrew first (recommended package manager for Mac):

    • Open Terminal (press Cmd + Space, type "Terminal", and press Enter)
    • Go to brew.sh and copy the installation command from the homepage
    • Paste the command into Terminal and press Enter
    • Important notes:
      • First-time Homebrew installation takes time (downloads ~800MB)
      • When prompted for your password, you won't see the characters as you type, but they are being entered - this is normal for security
      • Wait for the installation to complete
  2. Install Git using Homebrew:

    brew install git
  3. Verify installation:

    git --version

Note: While download packages exist from git-scm.com, they tend to be dated. Homebrew ensures you get the latest stable version and makes future updates easier.

Step 3: Install Python 3.11.9 (64-bit)

IMPORTANT: You must install Python 3.11.9 specifically, and it must be the 64-bit version for maximum stability.

For Windows Users:

  1. Go to https://www.python.org/downloads/release/python-3119/
  2. Scroll down to "Files" section
  3. Download "Windows installer (64-bit)" - make sure it says 64-bit
  4. Run the installer
  5. CRITICAL: Check the box "Add Python to PATH" during installation - this is essential for the project to work
  6. Click "Install Now"

For Mac Users:

  1. Go to https://www.python.org/downloads/release/python-3119/
  2. Scroll down to "Files" section
  3. Download "macOS 64-bit universal2 installer"
  4. Run the installer and follow the prompts
  5. Note: Python is automatically added to PATH on macOS

Step 4: Install PostgreSQL 14.18 Database

For Windows Users:

  1. Visit PostgreSQL 14.18 Downloads
  2. Download PostgreSQL 14.18 for Windows x86-64
  3. Run the installer and follow the setup wizard
  4. Component Selection: To minimize installation size, only keep these components checked:
    • PostgreSQL Server (required)
    • Command Line Tools (required)
    • pgAdmin 4 (uncheck - not needed, saves ~200MB)
    • Stack Builder (uncheck - not needed)
  5. Critical: During installation, you'll set a password for the default "postgres" user. Write this password down - you'll need it for your .env file
  6. Username Note: The default PostgreSQL username is "postgres" (not your Windows username)

For Mac Users:

  1. Visit PostgreSQL 14.18 Downloads
  2. Download PostgreSQL 14.18 for macOS
  3. Run the installer and follow the setup wizard
  4. Component Selection: To minimize installation size, only keep these components checked:
    • PostgreSQL Server (required)
    • Command Line Tools (required)
    • pgAdmin 4 (uncheck - not needed, saves ~200MB)
    • Stack Builder (uncheck - not needed)
  5. Critical: During installation, you'll set a password for the default "postgres" user. Write this password down - you'll need it for your .env file
  6. Username Note: The default PostgreSQL username is "postgres" (not your Mac username)

Alternative Installation (Advanced Users):

  • Homebrew (Mac): brew install postgresql@14
  • Package Manager (Linux): Check your distribution's package manager for PostgreSQL 14.x

Step 5: Download and Setup psxGPT

  1. Open your code editor
  2. Open the integrated terminal:
    • Press Ctrl + (backtick) to open the terminal inside your code editor
  3. Download the project:

Recommended: Fast Clone (Downloads Latest Code Only)

git clone --depth 1 https://github.com/ishaheen10/psxgpt.git
cd psxgpt

Alternative: Full History Clone (Larger Download)

git clone https://github.com/ishaheen10/psxgpt.git
cd psxgpt

💡 Note: The repository has been optimized for size. The shallow clone (--depth 1) downloads only the latest code (~19MB) instead of full history, making setup much faster for most users.

Step 6: Get Your API Keys

You'll need to sign up for these services and get API keys:

Required (Free - No Credit Card Needed):

  1. Google Gemini API

    • Click "Get API key in Google AI Studio"
    • Sign in with your Google account
    • Create a new API key
    • Copy the key
  2. LlamaParse API

    • Sign up for LlamaCloud
    • Go to API Keys section
    • Create a new API key
    • Copy the key (free tier includes 3,000 pages)

Optional (Credit Card Required):

  1. Anthropic API

    • Sign up for an Anthropic account
    • Go to your dashboard
    • Create an API key
    • Copy the key
  2. Mistral API (for scanned document OCR)

    • Sign up for Mistral AI
    • Go to API Keys
    • Create a new key
    • Copy the key

Save these keys safely - you'll paste them into your .env file in the next step.

Step 7: Configure Environment and Install Dependencies

  1. Create your environment file:

    • Simply rename .env.example to .env
    • Open the .env file in your code editor
    • Paste in your API keys from Step 6
    • Update the DATABASE_URL with your PostgreSQL password
  2. Install dependencies:

    pip install uv
    uv sync
  3. Enter virtual environment:

    For Windows:

    .venv\Scripts\activate

    If you get an execution policy error on Windows, run this first:

    Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

    Then try the activate command again.

    For Mac/Linux:

    source .venv/bin/activate

Version Compatibility Note

The specified versions (Python 3.11.9, PostgreSQL 14.18, Chainlit 2.5.5) are recommended for maximum stability and have been thoroughly tested with this project. While newer versions may work, these specific versions ensure the most reliable experience and avoid potential compatibility issues.

Step 8: Setup Database

First, verify PostgreSQL is accessible:

  1. Test if PostgreSQL commands work:

    createdb --version

    If you get "command not found" (Windows users):

    • PostgreSQL wasn't added to your system PATH during installation
    • You need to add it manually (see instructions below)

    Note: Database commands use -U postgres to specify the PostgreSQL user (not your computer username)

  2. Add PostgreSQL to PATH on Windows:

    • Press Windows key + R, type sysdm.cpl, press Enter
    • In the System Properties window, click the "Advanced" tab
    • Click the "Environment Variables..." button at the bottom
    • In the Environment Variables window, look at the top section labeled "User variables for [your username]"
    • Find "Path" in the list and click "Edit..." (if Path doesn't exist, click "New..." instead)
    • Click "New" and add: C:\Program Files\PostgreSQL\14\bin
    • Click "OK" to close the Edit window
    • Click "OK" to close Environment Variables window
    • Click "OK" to close System Properties window
    • Restart your code editor completely and open a new terminal
    • Test again with createdb --version

    Why User variables? We use User variables (not System variables) because:

    • No administrator privileges required
    • Only affects your user account (safer on shared computers)
    • Easier to troubleshoot if something goes wrong

    Note: If PostgreSQL was installed in a different location, the path might be:

    • C:\Program Files (x86)\PostgreSQL\14\bin (32-bit installation)
    • Or check your installation directory and add the \bin folder to PATH

Now create the database:

  1. Create the database:

    createdb -U postgres analyst_psx

    You'll be prompted for the postgres password you set during PostgreSQL installation.

  2. Setup the database schema:

    psql -U postgres -d analyst_psx -f chainlit_schema_psx.sql

    You'll be prompted for the postgres password again.

  3. Verify your configuration:

    • Double-check that your DATABASE_URL in the .env file matches your PostgreSQL setup
    • Format: postgresql://postgres:your_password@localhost:5432/analyst_psx

Step 9: Start the Application

Option A: Use Pre-Built Data (Fastest)

  1. Download pre-built search index from Google Drive: Download gemini_index_metadata.zip

    Alternatively, download a smaller index with 3 tickers here if experiencing slow download speed from Google Drive: Download gemini_index_metadata_small.zip

  2. Unzip and place in project directory:

    • Unzip the file - you should see a folder called gemini_index_metadata
    • Copy this entire folder to your psxChatGPT project directory (same level as Python files)
  3. Start the MCP server:

    python Step7MCPServerPsxGPT.py
    • Keep this terminal window open - the server needs to stay running
  4. Start the client (in a new terminal):

    • Open a second terminal in your code editor (Ctrl + Shift + ` to open new terminal)
    • Activate your virtual environment again:
      • Windows: .venv\Scripts\activate (if execution policy error, run Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser first)
      • Mac/Linux: source .venv/bin/activate
    • Run the client:
    chainlit run Step8MCPClientPsxGPT.py

    Alternative (for Gemini API free tier):

    chainlit run Step8MCPClientGemini.py
  5. Access and configure the application:

    • Open your browser to http://localhost:8000
    • Login with: analyst@psx.com / analyst123
    • Connect to MCP Server: After login, you'll see instructions in the client interface to connect to the MCP server - follow those prompts to complete the setup

Option B: Process Your Own Documents

If you want to analyze your own financial documents:

  1. Install Playwright (for downloading PDFs from websites):

    playwright install
  2. Run the processing pipeline:

    # Option 1: Download from PSX website
    python Step1DownloadPDFsSearch.py
    
    # Option 2: Place your own PDFs in psx_bank_reports/ folder, then run:
    python Step2ConvertPDFtoMarkdown.py
    python Step3ChunkMarkdown.py
    python Step4MetaDataTags.py
    python Step5CombineMetaData.py
    python Tool2ValidateProcessing.py   # Verify data quality
    python Step6CreateEmbeddings.py
    
    # Start server (keep this terminal open)
    python Step7MCPServerPsxGPT.py

    Then in a new terminal, start the client:

    # Activate virtual environment first
    source .venv/bin/activate   # Mac/Linux
    .venv\Scripts\activate      # Windows (if execution policy error, run Set-ExecutionPolicy first)
    
    # Run client
    chainlit run Step8MCPClientPsxGPT.py

    Alternative client (for Gemini API free tier):

    chainlit run Step8MCPClientGemini.py

Note: Step1 scripts are designed for PSX website. For other data sources, modify the download scripts or use a browser automation tool like browser-use.

Customization for Other Financial Data

To adapt psxGPT for different financial datasets:

  1. Replace Data Source: Modify Step1DownloadPDFsSearch.py to point to your data source
  2. Update File Paths: Change directory paths in scripts to match your folder structure
  3. Adjust Metadata Extraction: Modify Step4MetaDataTags.py for your document types
  4. Configure OCR: For scanned documents, ensure Mistral OCR is configured in Step2ConvertPDFtoMarkdown.py

Troubleshooting

Common Issues:

  • Scanned PDFs: Use Mistral OCR option in Step 2 for better text extraction
  • Large Files: LlamaParse free tier has 3,000 page limit
  • Database Connection: Verify PostgreSQL is running and credentials in .env are correct
  • API Limits: Check API key quotas if processing fails

Performance:

  • Processing time depends on document count and size
  • Vector embedding creation (Step 6) is the most time-intensive step
  • Consider processing documents in batches for large datasets

Authentication

Default Login Credentials:

  • Username: analyst@psx.com
  • Password: analyst123

Note: These are demo credentials configured in your .env file. For production use, implement proper authentication.

📄 License

MIT License

Copyright (c) 2024 psxGPT Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

ChatGPT like interface for Pakistan Stock Exchange financial filings.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors