π Master's in Applied Data Science | San Jose State University
π San Jose, California
I'm a passionate AI Engineer and Data Scientist currently working on LLMs (Large Language Models), Generative AI, and multi-agent systems's real-world applications. I specialize in designing cloud-based ML pipelines, building autonomous AI agents, and experimenting with Chain-of-Thought (CoT) and Chain-of-Draft (CoD) prompting techniques for enhanced reasoning in LLMs.
My work bridges the gap between cutting-edge research and scalable systemsβwhether it's building decision-making bots, fine-tuning transformer models, or enabling intelligent data pipelines on AWS and GCP.
- LLMs & Agentic AI: GPT-4, Claude, OpenAI API, LangChain, CrewAI, Retrieval-Augmented Generation (RAG)
- NLP & Transformers: Hugging Face Transformers, BERTScore, Generative AI, NLP, Explainability & Interpretability
- Core ML: Scikit-learn, TensorFlow, PyTorch, Predictive Modeling, Anomaly Detection, Reinforcement Learning, Time-Series Forecasting
- Data Science: Feature Engineering, A/B Testing, Model Evaluation, Statistical Analysis, KPI Reporting
- ETL & Pipelines: Apache Airflow, AWS Glue, PySpark
- Databases: MySQL, MongoDB, Google BigQuery, AWS RDS, Redshift
- Query Languages: SQL, NoSQL
- Cloud Platforms:
- π§ AWS: S3, Lambda, Glue, RDS, Redshift, Step Functions
- π¦ GCP: Vertex AI Workflows, Cloud Functions
- MLOps & DevOps: Docker, Git, GitHub Actions, CI/CD workflows
- Languages: Python, SQL, Bash, PowerShell
- Frameworks: Flask, FastAPI
- Tableau, Power BI, Microsoft Excel, Google Sheets, Google Apps Script
- Gradio, Hugging Face Model Hub & Spaces (ZeroGPU), REST APIs
Description:
Showcased the integration of the Unitree Go2 robot with OpenAI during a summer research project. Features voice command processing and AI-driven task execution, enhancing robotic functionalities through advanced AI models.
Technologies Used:
Python, OpenAI API, Robotics Integration
GitHub Repository:
Go2Bot-OpenAI-Integration
Description:
Developed a robust AWS-enabled data pipeline designed for real-time weather data analysis. The system automates data ingestion, processing, storage, and analysis, providing actionable insights from NOAA datasets.
Technologies Used:
AWS (S3, Lambda, Glue, EC2), Python, Apache Airflow, Pandas, Matplotlib
GitHub Repository:
AWS-Enabled-Data-Pipeline-for-Weather-Data-Analysis
Description:
Developed a machine learning model to detect paraphrased sentences, improving NLP applications' accuracy in understanding text similarity.
Technologies Used:
Python, NLP, Scikit-Learn
GitHub Repository:
Paraphrase-Detection-with-Quora-Question-Pairs
- LinkedIn: https://www.linkedin.com/in/saumyavarshney/
- Email: saumya2603@gmail.com