Sohail DataScientist

Sohail Syed is a Data Scientist with expertise in machine learning, statistical modeling, and data analysis, proficient in various tools and technologies including Python, R, and cloud services like AWS. He has experience in the entire data science project lifecycle, from data extraction to model implementation, and has worked on projects in healthcare and finance to drive data-driven decision-making. Sohail holds a Bachelor's degree in Information Technology and a Master's degree in Computer Science, with a strong background in predictive modeling and data visualization.

Uploaded by

david

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views3 pages

Sohail DataScientist

Uploaded by

david

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

SOHAIL SYED

Data Scientist
Cell phone: +1 732-913-8802
Email: Sohailsyed.work@gmail.com

Data Science professional who interprets and extracts intelligence from data and solves complex business problems
using machine learning and statistical models. Proficient in furnishing executive leadership team with insights,
analytics, reports and recommendations enabling effective strategic planning across all business units, distribution
channels and product lines.

Summary:
• Experienced in facilitating the entire life-cycle of a data science project: Data Extraction, Data Pre-Processing,
Feature Engineering, Dimensionality Reduction, Algorithm implementation, Back Testing and Validation.
• Expert at working with statistical tests: Two-way independent & paired t-test, one-way & two-way ANOVA along
with non-parametric tests: Chi-square tests, Mann-Whitney U, Wilcoxon rank tests, Shapiro-Wilk & Kruskal-
Wallis test.
• Proficient in Data transformations using Log, square-root, reciprocal, differencing and complete box-cox
transformation depending upon the dataset.
• Adept at Analysis of Missing data by exploring correlations and similarities, introducing dummy variables for
missing value treatments and choosing from imputation methods such as MICE in R and iterative imputer on
Python.
• Experienced in Machine Learning techniques such as regression and classification models such as Linear and
Polynomial Regression, Decision Trees, Logistic Regression, and Support Vector Machines.
• Experienced in Ensemble learning using Bagging, Boosting, Random Forests, AdaBoost, XGBoost; clustering
methods such as K-means, Agglomerative and Divisive, DBSCAN; Association Rule learning with Apriority.
• In-depth Knowledge of Dimensionality Reduction (PCA, LDA), Hyper-parameter tuning, Model Regularization
(Ridge, Lasso, Elastic Net) and Grid Search techniques to optimize model performance.
• Proficient at Data Cleaning process of outlier detection and removal using Grubb’s test for univariate analysis,
Leverage test, Mahalanobis and Cook’s distance for multivariate analysis.
• Proficient in Data Visualization tools such as Tableau and PowerBI, Big Data tools such as Hadoop HDFS, Spark
and MapReduce, MySQL, Oracle SQL and Redshift SQL and Microsoft Excel (VLOOKUP, Pivot tables).
• Excellent exposure to Data Visualization with PowerBI, Seaborn, Matplotlib and ggplot2.
• Experienced with Python and Deep Learning libraries such as NumPy, Pandas, SciPy, SkLearn & statsmodels,
MatplotLib, Seaborn, Theano, Tensorflow, Keras, nltk and R libraries ggplot2, dplyr, Esquisse, CRAN.
Skillset:
Machine Learning and Deep Learning Skills
Classification, Regression, Supervised, Unsupervised, Naive Bayes, Linear/ Logistic Regression, Regularization, k-NN,
Support Vector Machine (SVMs), Decision Trees, Ensemble Methods (Random Forest, Gradient Boosting Trees GBM,
XGBoost), Bayesian Statistics, PCA, SVD, Clustering (k-means, GMM, Spectral, Hierarchical), Multilayer NNs, CNNs, RNNs,
RNN-LSTMs, Restricted Boltzmann Machine
CLOUD
AWS SAGEMAKER, S3, Lambda, EC2, ECR, EBS, DynamoDB, RDS , Amazon Lex, Amazon poly DEVOPS
Docker, Kubernetes, Bitbucket, Flask, Github
TOOLS & TECHNOLOGIES
Python, R, MATLAB (Scikit-learn, MLLib ,Theano, Keras,Tensorflow, Spark, Hadoop, HDFS, MapReduce, Parallel
computing, Pandas, Numpy, TensorFlow, Quand, Quantopian)
PROFESSIONAL EXPERIENCE

Stripe January2024 – Present,US

Role: Data Scientist
The main objective of this group is to eliminate pain points for end-users throughout the enterprise. This role requires
the use of AI and Machine Learning to drive the Operational Data Science team’s objective of supporting the
organization’s initiatives to drive high application availability through data-driven automation. Building out Machine
Learning solutions for anomaly detection within end-user’s internal applications.
• Conceptualized transactions as statistical metrics as the key indicator of failures to test various hypotheses.
• Build machine learning models through all phases of development, from design through training, evaluation,
validation, and implementation.
• Build models using customer transaction data to make more accurate, real-time, and fluid decisions.
• Formulate context-relevant questions and hypotheses to foster data-driven research and decision-making.
• Evaluated various tracking matrices in data projects and improved overall accuracy of models from 69 to 84%.
• Ad Hoc queries, analyses, and segmentation studies that combine multiple tools and data sources and types to
extract insights from various A/B and multivariate tests
Environment: Natural Language Processing, Word2vec, Bag-of-words, Gradient Boosting, Classification, A/B Testing

Devita HealthCare - SEP2022 to Dec. 2023

Role: Data Science , Product & Analysis
• Predicted Patient Lifetime Value (PLTV) using historical healthcare data to support strategic decision-making in
patient care and resource allocation.
• Collected, cleaned, and visualized healthcare datasets using RStudio and Deep Feature Synthesis, uncovering
key statistical findings.
• Preprocessed unstructured healthcare data (e.g., EHRs, patient feedback) by tokenizing, stemming,
lemmatization, and encoding variables using Bag of Words and TF-IDF techniques.
• Applied dimensionality reduction methods like PCA and LDA to analyze high-dimensional healthcare data and
derive actionable insights.
• Classified clinical notes and patient feedback into predefined categories using NLP techniques, improving
information retrieval and care coordination.
• Grouped medical services and products into clusters based on usage patterns and historical data using k-means
clustering for effective resource management.
• Automated patient cohort creation by analyzing treatment patterns and historical data, enabling personalized
care interventions.
• Trained a Gradient Boosted Decision Tree Classifier with XGBoost to identify promoters and detractors of
healthcare services.
• Optimized neural network performance for predictive analytics through regularization and hyperparameter
tuning.
• Conducted sentiment analysis of online patient discussions about healthcare services using ScraPy,
BeautifulSoup, and NLP libraries.
• Utilized tools such as Python (NLTK, SpaCy, Sci-Kit Learn), R, and Tableau for healthcare data analysis and
visualization.
Tand Solutions , India Aug 2021– Jul 2022
Role: Data Science/Engineering
Responsibilities included developing a classification model to segregate customers and direct them to
subscription through App Behavioral Analysis.
• Used Python to develop different models & algorithms to predict the probability of customer subscribing for
premium using different variables.
• Built a classification model to classify customers for promotional deals to increase likelihood of subscription
using Logistic Regression and Decision Tree Classifier.
• Developed and implemented predictive models like Decision Tree, Support Vector Machine and Logistic
Regression to predict the probability of enrollment.
• Picked the final model based on ROC & AUC and fine-tuned the hyper parameters of the above models using
Grid Search to find the optimum model.
Environment: R, Tableau, Python – NLTK, SpaCy, Sci-Kit learn,SQL .

Elegent Machine, India Aug2020– May 2021

Role: Software Engineer(Intern)
Responsibilities included developing a regression model to predict Employee Income
• Designed algorithms to identify and extract income from demographics of data which has 34 variables and more
than 1 million observations. Developed a model to business team to help design them the income of employees.
• Performed Exploratory Data Analysis and Data Visualizations using Python to identify related variables for initial
inspection and performed univariate and bi-variate analysis to understand the intrinsic effect/combined to select
features for modelling to reduce them from 34 variables to 13 variables.
• Processing, cleansing, and verifying the integrity (Missing value imputation) of data to accommodate for better
performance and accuracy of the model by imputing based on the domain knowledge and regression.
• Analyzed and processed complex data sets using advanced querying, visualization, and analytics tools.
• Identified, measured and recommended improvement strategies for KPIs across all business areas.
• Built a framework in python with Machine learning algorithms like Regressions (linear, logistic etc..), SVM,
Random Forest, Decision trees to predict the income of a given person, Clustering and classification of data for
organizing the data to feed machine learning models and obtained AUC up to 0.86.
• Created and presented models for potential holdings to fund managers. Acheived 25% better than traditional
figures.
Environment: R, Tableau, Python – NLTK, SpaCy, Sci-Kit learn, SQL .

Education:
Bachlore’s Degree in Information Technology-Osmaia University ( 2020)
Master in Computer Science – Campbellsville University (2024)

Viplav Awasthi-DataScientist
No ratings yet
Viplav Awasthi-DataScientist
6 pages
Siva Ram Korakutty
No ratings yet
Siva Ram Korakutty
6 pages
Swapna
No ratings yet
Swapna
4 pages
Abhishek Data Scientist Resume
0% (1)
Abhishek Data Scientist Resume
5 pages
Rahul DS Resume
No ratings yet
Rahul DS Resume
1 page
Rishika Lekkala
No ratings yet
Rishika Lekkala
2 pages
Sriram Hariharan Resume
No ratings yet
Sriram Hariharan Resume
2 pages
Priyabrata Mishra Data Scientist Resume
No ratings yet
Priyabrata Mishra Data Scientist Resume
1 page
Kumod Sharma Resume
No ratings yet
Kumod Sharma Resume
1 page
Haritha Reddy
No ratings yet
Haritha Reddy
5 pages
Suresh
No ratings yet
Suresh
4 pages
Abhimanyu Resume
No ratings yet
Abhimanyu Resume
2 pages
Akash Resume - v1
No ratings yet
Akash Resume - v1
2 pages
Dnyaneshwar Ds
No ratings yet
Dnyaneshwar Ds
2 pages
Rahul CV
No ratings yet
Rahul CV
7 pages
Aashish Arora DS Noida
No ratings yet
Aashish Arora DS Noida
2 pages
Data Scientist/ Machine Learning Engineer: Summary
No ratings yet
Data Scientist/ Machine Learning Engineer: Summary
4 pages
Technical Synopsis
No ratings yet
Technical Synopsis
5 pages
Sri Harshitha's Resume
No ratings yet
Sri Harshitha's Resume
2 pages
Shashank Reddy Data Scientist C1
No ratings yet
Shashank Reddy Data Scientist C1
4 pages
Madhusudan
No ratings yet
Madhusudan
1 page
CV Deshmukh Vaishnavi
No ratings yet
CV Deshmukh Vaishnavi
2 pages
Sanju Sri Data Scientist
No ratings yet
Sanju Sri Data Scientist
5 pages
Anik Manik's Resume 2024
No ratings yet
Anik Manik's Resume 2024
3 pages
Data Analyst Resume by Coding Mafia
No ratings yet
Data Analyst Resume by Coding Mafia
1 page
Akshay Godugu Phone: (424) 272-5152: Required Skills/Experience # Years
No ratings yet
Akshay Godugu Phone: (424) 272-5152: Required Skills/Experience # Years
6 pages
Data Scientist Expertise Overview
No ratings yet
Data Scientist Expertise Overview
6 pages
Machine Learning Developer Resume by Coding Mafia-2
No ratings yet
Machine Learning Developer Resume by Coding Mafia-2
1 page
Data Science Career Overview
No ratings yet
Data Science Career Overview
4 pages
Bijesh Mishra Data Scientist
No ratings yet
Bijesh Mishra Data Scientist
2 pages
Tristan Dale Blackwell
No ratings yet
Tristan Dale Blackwell
7 pages
Data Science Career Overview
No ratings yet
Data Science Career Overview
3 pages
Anjum's Resume
No ratings yet
Anjum's Resume
4 pages
Geethika's Resume 1
No ratings yet
Geethika's Resume 1
1 page
Data Science Expertise & Projects
No ratings yet
Data Science Expertise & Projects
2 pages
Naukri RohitSingh (13y 0m)
No ratings yet
Naukri RohitSingh (13y 0m)
5 pages
VaishnaviS - 12 08 24
No ratings yet
VaishnaviS - 12 08 24
3 pages
CVSandhya Sharma
No ratings yet
CVSandhya Sharma
2 pages
Data Analyst with Machine Learning Expertise
No ratings yet
Data Analyst with Machine Learning Expertise
4 pages
John's Resume
No ratings yet
John's Resume
2 pages
Nikhil Doddad Resume
No ratings yet
Nikhil Doddad Resume
2 pages
Updated Resume Verdana
No ratings yet
Updated Resume Verdana
7 pages
Resumed Hwan It Ga J Jar
No ratings yet
Resumed Hwan It Ga J Jar
2 pages
Aakanksha Aundhkar Professional Summary
No ratings yet
Aakanksha Aundhkar Professional Summary
6 pages
Nelson - Mbi - Data Scientist - A
No ratings yet
Nelson - Mbi - Data Scientist - A
7 pages
Vaibhav Jain - Data Scientist
No ratings yet
Vaibhav Jain - Data Scientist
1 page
Data Science & ML Expert Profile
No ratings yet
Data Science & ML Expert Profile
5 pages
JayrajS Resume
No ratings yet
JayrajS Resume
1 page
Shivdip Dilip Deshmukh: Data Scientist at TCS
No ratings yet
Shivdip Dilip Deshmukh: Data Scientist at TCS
3 pages
Resume
No ratings yet
Resume
2 pages
Rohan Rajput Resume
No ratings yet
Rohan Rajput Resume
1 page
? Shirish Shankar Singarao Today Latest Resume 2025
No ratings yet
? Shirish Shankar Singarao Today Latest Resume 2025
2 pages
Resume Bharath K
No ratings yet
Resume Bharath K
4 pages
Jayavardhan's Resume
No ratings yet
Jayavardhan's Resume
1 page
Prajwal Ganvir Machine Learning
No ratings yet
Prajwal Ganvir Machine Learning
1 page
Data Analysis Shubham Gupta
No ratings yet
Data Analysis Shubham Gupta
2 pages
AI's Role in Modern Education
No ratings yet
AI's Role in Modern Education
27 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
21 pages
Instant Access To The Oxford Handbook of Computational Linguistics 2nd Edition Ruslan Mitkov (Editor) Ebook Full Chapters
100% (1)
Instant Access To The Oxford Handbook of Computational Linguistics 2nd Edition Ruslan Mitkov (Editor) Ebook Full Chapters
51 pages
Engineering Students' Seminar Report
No ratings yet
Engineering Students' Seminar Report
30 pages
500 - Projects of ML and DL
No ratings yet
500 - Projects of ML and DL
9 pages
Chapter1 Introduction To AI
No ratings yet
Chapter1 Introduction To AI
40 pages
What Is Ai Technology
No ratings yet
What Is Ai Technology
23 pages
Generative AI for Content Teams
100% (2)
Generative AI for Content Teams
17 pages
Paetzold (2016)
No ratings yet
Paetzold (2016)
6 pages
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
No ratings yet
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
30 pages
Cse 1 Syll
No ratings yet
Cse 1 Syll
17 pages
DEPI 2024 Professionals Catalog
No ratings yet
DEPI 2024 Professionals Catalog
54 pages
Brochure 10 Month Program On Applied DS and ML Analyttica LEAPS
No ratings yet
Brochure 10 Month Program On Applied DS and ML Analyttica LEAPS
53 pages
Ai 2marks 2
No ratings yet
Ai 2marks 2
16 pages
University Institute of Engineering Department of Computer Science and Engg
No ratings yet
University Institute of Engineering Department of Computer Science and Engg
9 pages
Editednow
No ratings yet
Editednow
5 pages
Lecture1 FSNLP
No ratings yet
Lecture1 FSNLP
49 pages
Choto Nunu Sumit Anand
No ratings yet
Choto Nunu Sumit Anand
13 pages
Practical Natural Language Processing A Comprehensive Guide To Building Real World NLP Systems 1st Edition Sowmya Vajjala Available All Format
100% (2)
Practical Natural Language Processing A Comprehensive Guide To Building Real World NLP Systems 1st Edition Sowmya Vajjala Available All Format
74 pages
Understanding AI: Types and Applications
No ratings yet
Understanding AI: Types and Applications
9 pages
CS 404 Syllabus
No ratings yet
CS 404 Syllabus
2 pages
Computer and Information Science (EG) (CIS) : Page 1 of 25
No ratings yet
Computer and Information Science (EG) (CIS) : Page 1 of 25
25 pages
Kinematics of Machinary Theory
No ratings yet
Kinematics of Machinary Theory
9 pages
IRS Notes
No ratings yet
IRS Notes
40 pages
Introduction To AI
No ratings yet
Introduction To AI
40 pages
Data Scientist Resume
No ratings yet
Data Scientist Resume
1 page
Domain Adaptation with Unlabeled Data
100% (1)
Domain Adaptation with Unlabeled Data
4 pages
Chatbot AI Implementation Guide
100% (1)
Chatbot AI Implementation Guide
38 pages
Data Scientist Interview Prep
No ratings yet
Data Scientist Interview Prep
23 pages
Evaluating The Effectiveness of ChatGPT in Enhancing Work Efficiency and Productivity of Computer Engineering Students in C++ Programming
No ratings yet
Evaluating The Effectiveness of ChatGPT in Enhancing Work Efficiency and Productivity of Computer Engineering Students in C++ Programming
28 pages

Sohail DataScientist

Uploaded by

Sohail DataScientist

Uploaded by

SOHAIL SYED

Stripe January2024 – Present,US

Devita HealthCare - SEP2022 to Dec. 2023

Elegent Machine, India Aug2020– May 2021

You might also like