Skip to content
View BhagyeshVaze's full-sized avatar

Block or report BhagyeshVaze

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
BhagyeshVaze/README.md

Hi, I'm Bhagyesh Vaze 👋

Bioinformatics Analyst | Data Scientist

MS Bioinformatics Graduate (Data Science Concentration) from Northeastern University (GPA 3.9/4.0), with experience in clinical data pipelines, genomic analysis, and machine learning.


What I Work On

  • Clinical & Genomic Data Analysis — GWAS, case-control studies, variant calling, and epidemiological modeling
  • Machine Learning for Healthcare — Predictive models for clinical outcomes using ensemble methods (RF, SVM, XGBoost)
  • ETL & Data Pipelines — End-to-end pipelines using Python, R, BigQuery, and PySpark on large-scale EHR datasets
  • Biostatistics — Survival analysis, ANOVA, hypothesis testing, logistic regression, and study design
  • Data Visualization — Dashboards and reports using Tableau, Looker, Power BI, and ggplot2

Tech Stack

My Skills

Languages & Libraries: Python, R, SQL, SAS, STATA, NumPy, Pandas, Scikit-learn, PyTorch, PySpark Tools: BigQuery, Tableau, Looker, Power BI, REDCap, Docker, Hadoop, Git
Clinical & Research: CDISC, SDTM, ADaM, ICD-10, EHR, ETL, IRB, ICH-GCP


Let's Connect

LinkedIn Email


A Bit More

  • Pharmacy background → Bioinformatics grad — I understand the biology behind the data, not just the numbers
  • Enjoy working on problems where statistics and genomics collide — GWAS, survival analysis, clinical trial design
  • Google Advanced Data Analytics Professional Certificate (2025)

Popular repositories Loading

  1. Chronic-kidney-disease-prediction-model Chronic-kidney-disease-prediction-model Public

    Comparative analysis of machine learning models for CKD prediction using clinical biomarkers. Logistic Regression, Random Forest, SVM, and Ensemble methods achieve up to 94% accuracy on 200 patient…

    HTML

  2. Hospital_ICU_Analysis Hospital_ICU_Analysis Public

    SQL and R-based analysis of ICU/SICU bed capacity across U.S. hospitals to identify optimal pilot sites for a nursing program.

    HTML

  3. EOPC-Genetic-Association-Analysis EOPC-Genetic-Association-Analysis Public

    GWAS pipeline in R identifying germline variants on Chromosome 18 associated with Early-Onset Pancreatic Cancer using logistic regression.

  4. ScRNA ScRNA Public

    Single-cell RNA-seq analysis of healthy human lung tissue using Seurat — QC, clustering, and cell type annotation of 18 lung cell populations"

  5. DNA-sequence-analyzer DNA-sequence-analyzer Public

    Python tools for parsing FASTA files, splitting protein secondary structure sequences, and computing nucleotide statistics including GC content.

    Python

  6. genebridge genebridge Public

    Python tools for querying, intersecting, and categorizing Chromosome 21 gene data across datasets.

    Python