SUMIT
With 3 years of experience in Data Science and Analytics domain, I have cultivated a robust skill set in data
science, encompassing SQL ,Power BI, Python, Machine Learning and Statistics. My expertise extends to being
KUMAR
a Microsoft Certified Power BI Data Analyst (PL 300), further complemented by the successful completion of a
Post Graduation Program in Data Science from Praxis Business School. I am confident that my technical
DATA SCIENTIST proficiency, analytical prowess, and enthusiasm for data science position me to excel in diverse projects and
industries.
sumitdas1608744@gmail.com
7042203264 Employment
sumitkumar28292
sumit744/Data-science-
Capgemini India Pune
projects
Data Scientist July 2021to Current
Education Worked in a machine learning project where I was responsible for collecting, cleaning, and analyzing large
sets of data. Used tools and techniques, including Python, and Excel.
Praxis Jan. 2020 To increase the accuracy, we trained the pretrained model with custom/use case specific scripts
Business to Feb. Worked on classification algorithms such as Random forest, Logistic regression, etc.. to increase the
School 2021 accuracy.
Post Graduation program in Extracted insights and trends from the data.
Data Science Performed exploratory data analysis (EDA) to identify patterns,
Created visualizations and reports to help communicate my findings to stockholders.
July Worked in a SAS to python conversion model.
2011
Birla institute of to
Experience in supporting and maintaining Azure Data Factory (ADF) pipelines and workflows
technology,Mesra May
Proficient in using ADF to perform data movement, data transformation, and data integration tasks
Experience in data manipulation by using Databrick.
2015
Strong understanding of Azure Data Lake Storage and Azure Blob Storage for data storage and retrieval.
BE Civil engineering
Implemented Power BI's Dataflow feature to optimize data flow processes, enabling streamlined data
preparation and ETL operations for enhanced reporting and analysis.
Skills Utilized advanced techniques like query optimization, data modeling, and visualization design to optimize
Power BI performance, resulting in faster report generation and improved data visualization capabilities.
Integrated SAP HANA databases into Power BI, establishing robust connections and data pipelines for
DATA ANALYSIS TECHNIQUES
real-time analytics and informed decision-making based on up-to-date data from these key sources.
Machine Learning
Linear Regression Sciffer analytics Pune
Statistical Models Intern Data Science Feb. 2021to July 2021
Classification Models Conduct research and collect information by assigned methods to acquire data from primary or secondary
data sources and maintain databases.
PROGRAMMING LANGUAGES Interpret data, analyses results using statistical techniques and provide ongoing reports.
Python Identify, analyses and interpret trends or patterns in complex data sets.
SQL Involved in Data Cleaning, Exploratory Data Analysis, feature engineering for providing preliminary insights
of data.
ANALYSIS TOOLS
MS Excel
SQL
Power Bi Portfolio
Google Analytics
Power BI Customer Segmentation
Machine Learning models: K-Means Clustering
DATA ANALYSIS TECHNIQUES Problem Statement To identify customers through RFM Segmentation that will generate Higher Revenue in
Power BI the future through Customized Marketing Action
Machine Learning Build a 5 or 6-segment solution (using K-Means), whichever is more meaningful and actionable
Linear Regression Define each of these segments and select the two most important segments for this business to act upon
Statistical Models Appropriate marketing actions to take, to target and derive more business from these two segments
Classification Models Telecom Customer Churn Prediction,Telecommunication industry
Machine learning models: Logistic regression, Random Forest & XG-Boost.
Telecom Customer Churn Prediction, dataset contain 22 features and 2938 rows
Using machine learning classification model customer churn is predicted.
Accuracy for the best model = 0.89 after balancing the dependent model F1 score: churn=.90 not
churn=.89
This analysis is effective at focusing customer retention marketing programs on the subset of the customer
base who are most vulnerable to churn
Tool used is python
Statistical Analysis on factors influencing Life Expectancy
Machine Learning models: multilinear Regression
Statistical Analysis on factors influencing Life Expectancy
This study is focused on immunization factors, mortality factors, economic factors, social factors and other
health-related factors as well For best features R-squared:0.843 Adj. R-squared:0.841
The tool used in this project is a python
1 of 1