0% found this document useful (0 votes)
58 views3 pages

Sohail DataScientist

Sohail Syed is a Data Scientist with expertise in machine learning, statistical modeling, and data analysis, proficient in various tools and technologies including Python, R, and cloud services like AWS. He has experience in the entire data science project lifecycle, from data extraction to model implementation, and has worked on projects in healthcare and finance to drive data-driven decision-making. Sohail holds a Bachelor's degree in Information Technology and a Master's degree in Computer Science, with a strong background in predictive modeling and data visualization.

Uploaded by

david
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views3 pages

Sohail DataScientist

Sohail Syed is a Data Scientist with expertise in machine learning, statistical modeling, and data analysis, proficient in various tools and technologies including Python, R, and cloud services like AWS. He has experience in the entire data science project lifecycle, from data extraction to model implementation, and has worked on projects in healthcare and finance to drive data-driven decision-making. Sohail holds a Bachelor's degree in Information Technology and a Master's degree in Computer Science, with a strong background in predictive modeling and data visualization.

Uploaded by

david
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

SOHAIL SYED

Data Scientist
Cell phone: +1 732-913-8802
Email: Sohailsyed.work@gmail.com

Data Science professional who interprets and extracts intelligence from data and solves complex business problems
using machine learning and statistical models. Proficient in furnishing executive leadership team with insights,
analytics, reports and recommendations enabling effective strategic planning across all business units, distribution
channels and product lines.

Summary:
• Experienced in facilitating the entire life-cycle of a data science project: Data Extraction, Data Pre-Processing,
Feature Engineering, Dimensionality Reduction, Algorithm implementation, Back Testing and Validation.
• Expert at working with statistical tests: Two-way independent & paired t-test, one-way & two-way ANOVA along
with non-parametric tests: Chi-square tests, Mann-Whitney U, Wilcoxon rank tests, Shapiro-Wilk & Kruskal-
Wallis test.
• Proficient in Data transformations using Log, square-root, reciprocal, differencing and complete box-cox
transformation depending upon the dataset.
• Adept at Analysis of Missing data by exploring correlations and similarities, introducing dummy variables for
missing value treatments and choosing from imputation methods such as MICE in R and iterative imputer on
Python.
• Experienced in Machine Learning techniques such as regression and classification models such as Linear and
Polynomial Regression, Decision Trees, Logistic Regression, and Support Vector Machines.
• Experienced in Ensemble learning using Bagging, Boosting, Random Forests, AdaBoost, XGBoost; clustering
methods such as K-means, Agglomerative and Divisive, DBSCAN; Association Rule learning with Apriority.
• In-depth Knowledge of Dimensionality Reduction (PCA, LDA), Hyper-parameter tuning, Model Regularization
(Ridge, Lasso, Elastic Net) and Grid Search techniques to optimize model performance.
• Proficient at Data Cleaning process of outlier detection and removal using Grubb’s test for univariate analysis,
Leverage test, Mahalanobis and Cook’s distance for multivariate analysis.
• Proficient in Data Visualization tools such as Tableau and PowerBI, Big Data tools such as Hadoop HDFS, Spark
and MapReduce, MySQL, Oracle SQL and Redshift SQL and Microsoft Excel (VLOOKUP, Pivot tables).
• Excellent exposure to Data Visualization with PowerBI, Seaborn, Matplotlib and ggplot2.
• Experienced with Python and Deep Learning libraries such as NumPy, Pandas, SciPy, SkLearn & statsmodels,
MatplotLib, Seaborn, Theano, Tensorflow, Keras, nltk and R libraries ggplot2, dplyr, Esquisse, CRAN.
Skillset:
Machine Learning and Deep Learning Skills
Classification, Regression, Supervised, Unsupervised, Naive Bayes, Linear/ Logistic Regression, Regularization, k-NN,
Support Vector Machine (SVMs), Decision Trees, Ensemble Methods (Random Forest, Gradient Boosting Trees GBM,
XGBoost), Bayesian Statistics, PCA, SVD, Clustering (k-means, GMM, Spectral, Hierarchical), Multilayer NNs, CNNs, RNNs,
RNN-LSTMs, Restricted Boltzmann Machine
CLOUD
AWS SAGEMAKER, S3, Lambda, EC2, ECR, EBS, DynamoDB, RDS , Amazon Lex, Amazon poly DEVOPS
Docker, Kubernetes, Bitbucket, Flask, Github
TOOLS & TECHNOLOGIES
Python, R, MATLAB (Scikit-learn, MLLib ,Theano, Keras,Tensorflow, Spark, Hadoop, HDFS, MapReduce, Parallel
computing, Pandas, Numpy, TensorFlow, Quand, Quantopian)
PROFESSIONAL EXPERIENCE

Stripe January2024 – Present,US


Role: Data Scientist
The main objective of this group is to eliminate pain points for end-users throughout the enterprise. This role requires
the use of AI and Machine Learning to drive the Operational Data Science team’s objective of supporting the
organization’s initiatives to drive high application availability through data-driven automation. Building out Machine
Learning solutions for anomaly detection within end-user’s internal applications.
• Conceptualized transactions as statistical metrics as the key indicator of failures to test various hypotheses.
• Build machine learning models through all phases of development, from design through training, evaluation,
validation, and implementation.
• Build models using customer transaction data to make more accurate, real-time, and fluid decisions.
• Formulate context-relevant questions and hypotheses to foster data-driven research and decision-making.
• Evaluated various tracking matrices in data projects and improved overall accuracy of models from 69 to 84%.
• Ad Hoc queries, analyses, and segmentation studies that combine multiple tools and data sources and types to
extract insights from various A/B and multivariate tests
Environment: Natural Language Processing, Word2vec, Bag-of-words, Gradient Boosting, Classification, A/B Testing

Devita HealthCare - SEP2022 to Dec. 2023


Role: Data Science , Product & Analysis
• Predicted Patient Lifetime Value (PLTV) using historical healthcare data to support strategic decision-making in
patient care and resource allocation.
• Collected, cleaned, and visualized healthcare datasets using RStudio and Deep Feature Synthesis, uncovering
key statistical findings.
• Preprocessed unstructured healthcare data (e.g., EHRs, patient feedback) by tokenizing, stemming,
lemmatization, and encoding variables using Bag of Words and TF-IDF techniques.
• Applied dimensionality reduction methods like PCA and LDA to analyze high-dimensional healthcare data and
derive actionable insights.
• Classified clinical notes and patient feedback into predefined categories using NLP techniques, improving
information retrieval and care coordination.
• Grouped medical services and products into clusters based on usage patterns and historical data using k-means
clustering for effective resource management.
• Automated patient cohort creation by analyzing treatment patterns and historical data, enabling personalized
care interventions.
• Trained a Gradient Boosted Decision Tree Classifier with XGBoost to identify promoters and detractors of
healthcare services.
• Optimized neural network performance for predictive analytics through regularization and hyperparameter
tuning.
• Conducted sentiment analysis of online patient discussions about healthcare services using ScraPy,
BeautifulSoup, and NLP libraries.
• Utilized tools such as Python (NLTK, SpaCy, Sci-Kit Learn), R, and Tableau for healthcare data analysis and
visualization.
Tand Solutions , India Aug 2021– Jul 2022
Role: Data Science/Engineering
Responsibilities included developing a classification model to segregate customers and direct them to
subscription through App Behavioral Analysis.
• Used Python to develop different models & algorithms to predict the probability of customer subscribing for
premium using different variables.
• Built a classification model to classify customers for promotional deals to increase likelihood of subscription
using Logistic Regression and Decision Tree Classifier.
• Developed and implemented predictive models like Decision Tree, Support Vector Machine and Logistic
Regression to predict the probability of enrollment.
• Picked the final model based on ROC & AUC and fine-tuned the hyper parameters of the above models using
Grid Search to find the optimum model.
Environment: R, Tableau, Python – NLTK, SpaCy, Sci-Kit learn,SQL .

Elegent Machine, India Aug2020– May 2021


Role: Software Engineer(Intern)
Responsibilities included developing a regression model to predict Employee Income
• Designed algorithms to identify and extract income from demographics of data which has 34 variables and more
than 1 million observations. Developed a model to business team to help design them the income of employees.
• Performed Exploratory Data Analysis and Data Visualizations using Python to identify related variables for initial
inspection and performed univariate and bi-variate analysis to understand the intrinsic effect/combined to select
features for modelling to reduce them from 34 variables to 13 variables.
• Processing, cleansing, and verifying the integrity (Missing value imputation) of data to accommodate for better
performance and accuracy of the model by imputing based on the domain knowledge and regression.
• Analyzed and processed complex data sets using advanced querying, visualization, and analytics tools.
• Identified, measured and recommended improvement strategies for KPIs across all business areas.
• Built a framework in python with Machine learning algorithms like Regressions (linear, logistic etc..), SVM,
Random Forest, Decision trees to predict the income of a given person, Clustering and classification of data for
organizing the data to feed machine learning models and obtained AUC up to 0.86.
• Created and presented models for potential holdings to fund managers. Acheived 25% better than traditional
figures.
Environment: R, Tableau, Python – NLTK, SpaCy, Sci-Kit learn, SQL .

Education:
Bachlore’s Degree in Information Technology-Osmaia University ( 2020)
Master in Computer Science – Campbellsville University (2024)

You might also like