Skip to content

gqgs/llm100kbench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Investment Benchmark

A tool for benchmarking and tracking Large Language Model (LLM) investment decisions.

Overview

This project provides a framework to create, manage, and track investment portfolios generated by LLM models. It allows you to:

  • Create new portfolios
  • List current holdings and recent context
  • Update portfolios based on model decisions

The model executions and their current context can be seen here.

Automated Weekly Runs

The active model roster is configured in models.json. The weekly GitHub Actions workflow runs the benchmark every Monday, writes model orders under orders/<model>/<date>.json, stores the market snapshot in prices/<date>.csv, updates llm100kbench.db, and regenerates this README's current portfolio section.

The project is intentionally limited to free API tiers. Models that require paid metered API access or subscriptions are archived instead of being run automatically.

Each weekly run also writes a concise decision log under logs/<model>/<date>.md, including the model used, validation status, per-trade rationale, context, and validation notes when a model response is rejected.

Why?

To optimize their portfolio, the primary objective defined for the LLMs, it is imperative to evaluate the risk-reward ratio, formulate cogent assumptions about future market conditions, and leverage tools and their understanding of human psychology and financial market dynamics.

This benchmark may be a good proxy to measure how well LLMs are able to coordinate the aforementioned efforts.

Notes

  • chatgpt, deepseek, and grok are kept as continuing benchmark identities. Their exact backend model IDs are recorded in each new order's metadata.
  • perplexity is archived for future runs because its API is paid and the free chat UI is not suitable for unattended automation.
  • Claude and other paid-only APIs are not included while the project keeps the free-tier-only restriction.

Project Structure

  • cmd: Contains the main command implementations
    • create: Initialize new portfolios
    • list: Display current holdings and context
    • update: Process investment orders and update holdings
    • stocks: Fetch most recent stock prices

Prompt

The most recent prompt with the clear guidelines can be see here and here.

Current Portfolio (2026-06-15)

Portfolio Value by Model

pie showData
    "deepseek" : 214020
    "chatgpt" : 123732
    "mistral" : 100000
    "qwen" : 100000
    "gpt-oss" : 95800
    "gemini" : 81862
    "llama" : 47102
Loading
Model Ticket Sum Quantity
chatgpt USD 69 69
chatgpt AAPL 123663 418
deepseek AMD 1655 3
deepseek ASML 186835 100
deepseek MSFT 2789 7
deepseek SNPS 22742 50
gemini AAPL 12425 42
gemini AMD 10479 19
gemini ASML 7473 4
gemini CRDO 4993 20
gemini GFS 5002 60
gemini NVDA 6280 30
gemini NXPI 9521 30
gemini ON 5181 42
gemini QCOM 10216 46
gemini TSLA 10292 25
mistral USD 100000 100000
llama USD 47102 47102
gpt-oss AAPL 95558 323
gpt-oss CMCSA 242 10
qwen USD 100000 100000
Model Total Sum Change
deepseek 214020
chatgpt 123732
mistral 100000
qwen 100000
gpt-oss 95800
gemini 81862
llama 47102

About

LLM 100k portfolio management benchmark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors