A ChatGPT-like application for Pakistan Stock Exchange (PSX) financial data, which can be repurposed for any financial data including querying a data room during acquisition due diligence.
psxGPT allows financial analysts to query and analyze financial data using plain English. For example:
"Get me Deposits per branch for HBL, UBL and MEBL in 2024"
psxGPT will find the necessary information and compile a report using ONLY data found in financial reports.
Use Cases:
- Financial statement analysis and comparison
- Due diligence data room queries
- Regulatory filing research
- Querying Economic Survey of Pakistan or SBP Annual Report
Get psxGPT running in under 5 minutes using GitHub Codespaces:
- Click the green "Code" button on the psxGPT GitHub repository
- Select "Codespaces" tab and click "Create codespace on master"
- Wait for the environment to load (this may take a few minutes)
- Download the pre-built search index: gemini_index_metadata.zip
- Unzip the file on your computer - you should see a folder called
gemini_index_metadata - Upload to Codespaces: Drag and drop the entire
gemini_index_metadatafolder into your Codespaces file explorer (same level as the Python files)
- Rename the environment file:
mv .env.example .env
- Get your Gemini API key:
- Go to Google AI Studio
- Sign in with your Google account and create a new API key
- Edit the .env file:
- Open
.envin the Codespaces editor - Add your Gemini API key:
GEMINI_API_KEY=your_api_key_here - Delete the entire DATABASE_URL line (not needed for quick start, eliminates chat history though)
- Save the file
- Open
pip install uv
uv syncsource .venv/bin/activatepython Step7MCPServerPsxGPT.pyImportant: Once the server starts, you'll see a "Ports" tab at the bottom of Codespaces. Click on it and change its visibility from "Private" to "Public" - this is essential for the connection to work.
- Open a new terminal: Click the
+button in the terminal area - Activate virtual environment again:
source .venv/bin/activate - Start the client:
chainlit run Step8MCPClientGemini.py
- Access the application: Click on the port 8000 URL that appears in the Ports tab
- Login: Use
analyst@psx.com/analyst123 - Connect to MCP Server: Follow the instructions in the interface to connect to your running MCP server
🎉 You're ready! Try asking: "Get me Deposits per branch for HBL, UBL and MEBL in 2024"
For users who prefer to install psxGPT on their local machine, follow the detailed instructions below.
psxGPT processes financial documents through an 8-step pipeline:
- Download PDFs (
Step1DownloadPDFsSearch.pyorStep1DownloadPDFsTickers.py) - Convert to Markdown (
Step2ConvertPDFtoMarkdown.py) - Uses LlamaParse orTool1Mistral_OCR.pyfor scanned documents - Create Chunks (
Step3ChunkMarkdown.py) - Splits into searchable segments - Extract Metadata (
Step4MetaDataTags.py) - Identifies companies, dates, report types - Combine Data (
Step5CombineMetaData.py) - Consolidates all metadata - Build Search Index (
Step6CreateEmbeddings.py) - Creates vector embeddings for AI search - Start Backend Server (
Step7MCPServerPsxGPT.py) - Launches the data query server - Launch Web Interface (
Step8MCPClientPsxGPT.pyfor using with Anthropic API orStep8MCPClientGemini.pyfor using with Gemini API (free tier)) - Starts the user-friendly chat interface
Quality Assurance: Use Tool2ValidateProcessing.py to verify data quality after processing.
This guide walks you through setting up psxGPT step-by-step. The process takes about 30-45 minutes and involves:
- Installing required software (Git, Python, PostgreSQL, code editor)
- Downloading the project from GitHub
- Getting API keys for AI services
- Configuring the environment and installing dependencies
- Setting up the database and starting the application
Before starting, you'll need:
- A computer with Windows, macOS, or Linux
- Internet connection for downloads
- API keys for AI services (some free, some require credit card)
Choose one of these beginner-friendly code editors:
- Windsurf (Recommended - AI-powered coding assistant)
- Cursor (AI-powered VS Code alternative)
- VS Code (Popular free editor)
Download and install your chosen editor following the installer instructions.
For Windows Users:
-
Check Your PC Type First:
- Press
Windows key + Rto open the Run dialog - Type
msinfo32and press Enter - Look for "System Type" in the System Information window:
- x64-based PC = Intel/AMD processor (most common)
- ARM64-based PC = ARM processor (newer Surface devices, some laptops)
- Press
-
Download the correct installer for your system:
- 64-bit Git for Windows Setup (for x64-based PC)
- ARM64 Git for Windows Setup (for ARM64-based PC)
-
Run the installer and follow the setup wizard
-
Installation tip: The installer will ask many questions about editors, line endings, etc. For this project, these choices don't matter - simply keep clicking "Next" with the default options
-
When installation is complete, you can access Git through "Git Bash" or Command Prompt
For Mac Users:
-
Install Homebrew first (recommended package manager for Mac):
- Open Terminal (press
Cmd + Space, type "Terminal", and press Enter) - Go to brew.sh and copy the installation command from the homepage
- Paste the command into Terminal and press Enter
- Important notes:
- First-time Homebrew installation takes time (downloads ~800MB)
- When prompted for your password, you won't see the characters as you type, but they are being entered - this is normal for security
- Wait for the installation to complete
- Open Terminal (press
-
Install Git using Homebrew:
brew install git
-
Verify installation:
git --version
Note: While download packages exist from git-scm.com, they tend to be dated. Homebrew ensures you get the latest stable version and makes future updates easier.
IMPORTANT: You must install Python 3.11.9 specifically, and it must be the 64-bit version for maximum stability.
For Windows Users:
- Go to https://www.python.org/downloads/release/python-3119/
- Scroll down to "Files" section
- Download "Windows installer (64-bit)" - make sure it says 64-bit
- Run the installer
- CRITICAL: Check the box "Add Python to PATH" during installation - this is essential for the project to work
- Click "Install Now"
For Mac Users:
- Go to https://www.python.org/downloads/release/python-3119/
- Scroll down to "Files" section
- Download "macOS 64-bit universal2 installer"
- Run the installer and follow the prompts
- Note: Python is automatically added to PATH on macOS
For Windows Users:
- Visit PostgreSQL 14.18 Downloads
- Download PostgreSQL 14.18 for Windows x86-64
- Run the installer and follow the setup wizard
- Component Selection: To minimize installation size, only keep these components checked:
- ✅ PostgreSQL Server (required)
- ✅ Command Line Tools (required)
- ❌ pgAdmin 4 (uncheck - not needed, saves ~200MB)
- ❌ Stack Builder (uncheck - not needed)
- Critical: During installation, you'll set a password for the default "postgres" user. Write this password down - you'll need it for your .env file
- Username Note: The default PostgreSQL username is "postgres" (not your Windows username)
For Mac Users:
- Visit PostgreSQL 14.18 Downloads
- Download PostgreSQL 14.18 for macOS
- Run the installer and follow the setup wizard
- Component Selection: To minimize installation size, only keep these components checked:
- ✅ PostgreSQL Server (required)
- ✅ Command Line Tools (required)
- ❌ pgAdmin 4 (uncheck - not needed, saves ~200MB)
- ❌ Stack Builder (uncheck - not needed)
- Critical: During installation, you'll set a password for the default "postgres" user. Write this password down - you'll need it for your .env file
- Username Note: The default PostgreSQL username is "postgres" (not your Mac username)
Alternative Installation (Advanced Users):
- Homebrew (Mac):
brew install postgresql@14 - Package Manager (Linux): Check your distribution's package manager for PostgreSQL 14.x
- Open your code editor
- Open the integrated terminal:
- Press
Ctrl +(backtick) to open the terminal inside your code editor
- Press
- Download the project:
Recommended: Fast Clone (Downloads Latest Code Only)
git clone --depth 1 https://github.com/ishaheen10/psxgpt.git
cd psxgptAlternative: Full History Clone (Larger Download)
git clone https://github.com/ishaheen10/psxgpt.git
cd psxgpt💡 Note: The repository has been optimized for size. The shallow clone (
--depth 1) downloads only the latest code (~19MB) instead of full history, making setup much faster for most users.
You'll need to sign up for these services and get API keys:
Required (Free - No Credit Card Needed):
-
- Click "Get API key in Google AI Studio"
- Sign in with your Google account
- Create a new API key
- Copy the key
-
- Sign up for LlamaCloud
- Go to API Keys section
- Create a new API key
- Copy the key (free tier includes 3,000 pages)
Optional (Credit Card Required):
-
- Sign up for an Anthropic account
- Go to your dashboard
- Create an API key
- Copy the key
-
Mistral API (for scanned document OCR)
- Sign up for Mistral AI
- Go to API Keys
- Create a new key
- Copy the key
Save these keys safely - you'll paste them into your .env file in the next step.
-
Create your environment file:
- Simply rename
.env.exampleto.env - Open the
.envfile in your code editor - Paste in your API keys from Step 6
- Update the DATABASE_URL with your PostgreSQL password
- Simply rename
-
Install dependencies:
pip install uv uv sync
-
Enter virtual environment:
For Windows:
.venv\Scripts\activate
If you get an execution policy error on Windows, run this first:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Then try the activate command again.
For Mac/Linux:
source .venv/bin/activate
The specified versions (Python 3.11.9, PostgreSQL 14.18, Chainlit 2.5.5) are recommended for maximum stability and have been thoroughly tested with this project. While newer versions may work, these specific versions ensure the most reliable experience and avoid potential compatibility issues.
First, verify PostgreSQL is accessible:
-
Test if PostgreSQL commands work:
createdb --version
If you get "command not found" (Windows users):
- PostgreSQL wasn't added to your system PATH during installation
- You need to add it manually (see instructions below)
Note: Database commands use
-U postgresto specify the PostgreSQL user (not your computer username) -
Add PostgreSQL to PATH on Windows:
- Press
Windows key + R, typesysdm.cpl, press Enter - In the System Properties window, click the "Advanced" tab
- Click the "Environment Variables..." button at the bottom
- In the Environment Variables window, look at the top section labeled "User variables for [your username]"
- Find "Path" in the list and click "Edit..." (if Path doesn't exist, click "New..." instead)
- Click "New" and add:
C:\Program Files\PostgreSQL\14\bin - Click "OK" to close the Edit window
- Click "OK" to close Environment Variables window
- Click "OK" to close System Properties window
- Restart your code editor completely and open a new terminal
- Test again with
createdb --version
Why User variables? We use User variables (not System variables) because:
- No administrator privileges required
- Only affects your user account (safer on shared computers)
- Easier to troubleshoot if something goes wrong
Note: If PostgreSQL was installed in a different location, the path might be:
C:\Program Files (x86)\PostgreSQL\14\bin(32-bit installation)- Or check your installation directory and add the
\binfolder to PATH
- Press
Now create the database:
-
Create the database:
createdb -U postgres analyst_psx
You'll be prompted for the postgres password you set during PostgreSQL installation.
-
Setup the database schema:
psql -U postgres -d analyst_psx -f chainlit_schema_psx.sql
You'll be prompted for the postgres password again.
-
Verify your configuration:
- Double-check that your DATABASE_URL in the
.envfile matches your PostgreSQL setup - Format:
postgresql://postgres:your_password@localhost:5432/analyst_psx
- Double-check that your DATABASE_URL in the
Option A: Use Pre-Built Data (Fastest)
-
Download pre-built search index from Google Drive: Download gemini_index_metadata.zip
Alternatively, download a smaller index with 3 tickers here if experiencing slow download speed from Google Drive: Download gemini_index_metadata_small.zip
-
Unzip and place in project directory:
- Unzip the file - you should see a folder called
gemini_index_metadata - Copy this entire folder to your psxChatGPT project directory (same level as Python files)
- Unzip the file - you should see a folder called
-
Start the MCP server:
python Step7MCPServerPsxGPT.py
- Keep this terminal window open - the server needs to stay running
-
Start the client (in a new terminal):
- Open a second terminal in your code editor (
Ctrl + Shift +` to open new terminal) - Activate your virtual environment again:
- Windows:
.venv\Scripts\activate(if execution policy error, runSet-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUserfirst) - Mac/Linux:
source .venv/bin/activate
- Windows:
- Run the client:
chainlit run Step8MCPClientPsxGPT.py
Alternative (for Gemini API free tier):
chainlit run Step8MCPClientGemini.py
- Open a second terminal in your code editor (
-
Access and configure the application:
- Open your browser to http://localhost:8000
- Login with:
analyst@psx.com/analyst123 - Connect to MCP Server: After login, you'll see instructions in the client interface to connect to the MCP server - follow those prompts to complete the setup
Option B: Process Your Own Documents
If you want to analyze your own financial documents:
-
Install Playwright (for downloading PDFs from websites):
playwright install
-
Run the processing pipeline:
# Option 1: Download from PSX website python Step1DownloadPDFsSearch.py # Option 2: Place your own PDFs in psx_bank_reports/ folder, then run: python Step2ConvertPDFtoMarkdown.py python Step3ChunkMarkdown.py python Step4MetaDataTags.py python Step5CombineMetaData.py python Tool2ValidateProcessing.py # Verify data quality python Step6CreateEmbeddings.py # Start server (keep this terminal open) python Step7MCPServerPsxGPT.py
Then in a new terminal, start the client:
# Activate virtual environment first source .venv/bin/activate # Mac/Linux .venv\Scripts\activate # Windows (if execution policy error, run Set-ExecutionPolicy first) # Run client chainlit run Step8MCPClientPsxGPT.py
Alternative client (for Gemini API free tier):
chainlit run Step8MCPClientGemini.py
Note: Step1 scripts are designed for PSX website. For other data sources, modify the download scripts or use a browser automation tool like browser-use.
To adapt psxGPT for different financial datasets:
- Replace Data Source: Modify
Step1DownloadPDFsSearch.pyto point to your data source - Update File Paths: Change directory paths in scripts to match your folder structure
- Adjust Metadata Extraction: Modify
Step4MetaDataTags.pyfor your document types - Configure OCR: For scanned documents, ensure Mistral OCR is configured in
Step2ConvertPDFtoMarkdown.py
Common Issues:
- Scanned PDFs: Use Mistral OCR option in Step 2 for better text extraction
- Large Files: LlamaParse free tier has 3,000 page limit
- Database Connection: Verify PostgreSQL is running and credentials in
.envare correct - API Limits: Check API key quotas if processing fails
Performance:
- Processing time depends on document count and size
- Vector embedding creation (Step 6) is the most time-intensive step
- Consider processing documents in batches for large datasets
Default Login Credentials:
- Username:
analyst@psx.com - Password:
analyst123
Note: These are demo credentials configured in your .env file. For production use, implement proper authentication.
MIT License
Copyright (c) 2024 psxGPT Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.