Skip to content
View stufield's full-sized avatar
:octocat:
:octocat:

Block or report stufield

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
stufield/README.md

Welcome To My Homepage!

Bridging data, science & strategy 🚀 Machine Learning 🛠️ Tool Development 📦 R Software 🧭 Leadership 🧬 Life Sciences Domain Expert ✨ Director of Data Science @ Cercle.ai

stufield


🔬 Domain expertise: Proteomics, biomarker discovery, diagnostics, life sciences, predictive modeling

📊 Technical tools: R, machine learning, statistics, Python, experimental design, reproducible research

💪 Strengths: Translating complexity, cross-functional collaboration, storytelling with data


"Making predictions is easy ... making accurate ones is much more difficult." ⎯ Me⎯


 

About Me

I love to solve problems.

Often the problem can be understanding a complex biological process, but it can also be as simple as fixing something that's broken (e.g. a door that jams, a bicycle, or even machine learning software). In particular, I like to apply my data science skills to better understand, or even solve, the problems we face.

Over the past 14+ years I have combined my statistical knowledge and Open-Source Software tools to solve complex problems in the Life Sciences proteomics (high dimensional) space. In so doing, I have created a comprehensive R-based machine learning analysis ecosystem that standardizes and enables biomarker discovery and predictive model development.

Sometimes the problem is inconsistency across teams or analysts ... thus I promote adherence of "tidy" data principles and am a strong proponent reproducible research and use of bioinformatics pipelines.

Other times the problem can be sharing results across the organization ... thus developing an Application Program Interface (API) infrastructure that enables anyone to access model results with ease.

With my teaching background, I find it important to mentor junior team members while simultaneously leading more senior members. This collaborative spirit is essential to building and effective team that delivers to stakeholders, fosters a sense of accomplishment, and drives revenue generation.

I am always open to discuss possible roles 🔭 and whether my skill set can solve problems in your space!


Skills

Machine Learning 🚀 Statistics 📊 Open-Source 💻 Software Tools 🔧
Random Forest Logistic regression R Linux🐧, MacOS 🍎
Naive Bayes Linear regression C++ Git, GitHub :octocat:
Lasso/ridge regression GLMMs Python 🐍 AWS
k-Nearest neighbour Mixed-effects models LaTeX BASH, GNU
PCA Survival analysis CI/CD BitBucket
Ensemble methods Multivariate statistics Docker 🐋 Slack
Maximum Likelihood ANOVA Kubernetes

Application of Skill Set

  • Data Analysis: created high-dimensional, high-throughput, multi-plex, proteomics machine learning analysis ecosystem which enabled (and standardized) biomarker discovery and model development across analysts.
  • Project Leadership: led highly successful Open-Source Software (OSS) initiative enabling customers to not only understand highly complex analysis concepts in the proteomics space, but to conduct those analyses themselves.
  • Analysis Reports: generated standardized analysis templates enabling reproducible research and results across the organization.
  • Leadership: successfully led a team of 3-5 direct reports through analyses, code review, self-enablement, and career development.
  • Written Accomplishment: proven ability to summarize complex analyses via strong publication record.

Tech Notes & Vignettes 📚

Topic 🚀 Thumbnail 📈
False Discovery
Mixture Models
Logistic Regression
Naive Bayes
The Birthday Paradox
Mack-Wolfe Tests
Mixed Effects
Monty Hall Paradox
Decision Boundaries
Class Imbalance

Baseball

Topic 🚀 Thumbnail 📈
Pitch Classifier

Other Interests

  • 💬 Favorite food: 🐟 🌮
  • 📚 I am currently learning woodworking 🪵 ... I'm mostly good at making a lot of sawdust!
  • 💬 Ask me about: bikes and R ... I'll talk your 👂 off!
  • 🚴 I'm an avid cyclist: come say hi on

More Details

  • I maintain several R software libraries (📦) that implement statistical and machine learning techniques in biomarker discovery. Some of my popular published 📦 are:
  • These projects support analyses in the general Life Sciences (BioTech) space to generate proteomic based insights in health spaces such as:
    • cardiovascular disease
    • liver disease (NASH/NAFLD)
    • alcohol effects
    • biological aging
    • exercise status
    • metabolic disease
  • Favorite techniques:
    • random forest
    • logistic regression (ol' faithful)
    • naive Bayes
    • KKNN (nearest neighbor)
    • survival analyses
    • ensemble methods
  • I am a proponent of the open-source software, conducting the majority of my research/analysis via Linux toolkits, R, and the RStudio/Posit IDE.
  • I promote conforming to the adherence of so-called "tidy" data, a philosophy of data science designed to share underlying data structure, grammar, and format which facilitates the generation of reproducible analyses.

🔧 Tools & Languages

📈 GitHub Stats

Stu's GitHub Stats

🔧 GitHub Commits

Contributions


🔗 Links & Resources


Pinned Loading

  1. gitr gitr Public

    A light-weight, dependency-free, API to access system-level git commands from within R

    R 2

  2. helpr helpr Public

    The helpr package contains numerous helpers, wrappers, and utilities used throughout my analysis suite. It intentionally favors base R over higher level *tidyverse* to minimize imports.

    R 2

  3. wranglr wranglr Public

    The wranglr package contains general functions necessary to manipulate and wrangle internal R representations of proteomic data into convenient forms for analysis.

    R

  4. featureselectr featureselectr Public

    An object oriented package containing functionality designed for feature selection, model building, and/or classifier development.

    R

  5. power power Public

    Informal suite of functions to calculate simple power calculations via p-value simulation.

    R

  6. stabilityselectr stabilityselectr Public

    The stabilityselectr package performs stability selection with a variety of kernels provided by the 'glmnet' package, and provides simple tools for plotting and extracting selected features. There …

    R