Skip to content

Misc ai projects: pinecone vectorized documents for LLM question context; random forest classification model training with openai embeddings;

License

Notifications You must be signed in to change notification settings

sergiosolorzano/tooling-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Description:

DeepWiki

Miscellaneous ai projects that prototype the use of AI and machine learning to enhance predictive results.

You may find useful code or an approach here but the README may not be well explained and the code likely requires refactoring, sorry but these are prototypes. However, I'll specify if a project works or does not and I will provide a video running a project.

Git digest for performing LLM ingestion: ./gitingest/classification_digest.txt.txt and ./gitingest/vector_db_digest.txt

star


Projects included:

1. vectordb/gpt-embeddings: Vectorize documents in Pinecone vector db for LLM queries:

You can also see the app execution video here.

pinecone-embeddings-to-gpt_.mp4

Special thanks to @Roulin for the clear instructions in the blog fail this link here.

2. classification/ada_and_randomforest: Mail Spam Classification using OpenAI embeddings and a Random Forest Classification model

  • Vectorize mail dataset with OpenAI's text-embedding-ada-002

  • Train a random forest classification model with these embedding vectors (features) and labels (mail is spam or ham type)

  • Test the model and report stats

    (oai310env) sergio@Home-Win11:~/my-repos/tooling-ai/classification/ada_and_randomforest$ ./classify_ada_rndforest.py

image
Start to train the model.
Time elapsed to train the model for 50 mails: 0 minutes, 0 seconds, 48 milliseconds

          precision    recall  f1-score   support

       0       0.75      1.00      0.86         3
       1       1.00      0.86      0.92         7

accuracy                           0.90        10

macro avg 0.88 0.93 0.89 10 weighted avg 0.93 0.90 0.90 10

Special thanks to Kaggle for the dataset and the Geeks for Greeks community for the clear instructions


If you find this helpful you can buy me a coffee :)

Buy Me A Coffee

About

Misc ai projects: pinecone vectorized documents for LLM question context; random forest classification model training with openai embeddings;

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published