Rahul Godara
Contact: 9540393029 (M); Email: rahulg3990@gmail.com
LinkedIn Profile
LEAD DATA SCIENTIST
SUMMARY
Data Scientist with 9+ years of experience in processing and analysing data across a variety of
industries. Leverages various mathematical, statistical, and machine learning tools to collaboratively
synthesize business insights and drive innovative solutions for productivity, efficiency, and revenue
across Healthcare, Banking, CPG, Research, Energy etc.
Extensive experience in 3rd-party cloud resources: AWS, Google Cloud, and Azure.
Working with and querying large data sets from big data stores using Hadoop Data Lakes, Data
Warehouse, Amazon AWS, Redshift, and NoSQL.
Ensemble algorithm techniques, including Bagging, Boosting, and Stacking; knowledge with
Natural Language Processing (NLP) methods, in particular BERT, GPT, word2vec for embeddings,
sentiment analysis, Name Entity Recognition, and Topic Modelling Time Series Analysis with
ARIMA, SARIA, LSTM, RNN, and Prophet.
Design and Build data products includes data security, insider risk management, communication
compliance, and data lifecycle management, supported by Copilot for Security innovations.
Experience in the entire data science project life cycle and actively involved in all the phases,
including data extraction, data cleaning, statistical modeling, and data visualization with large data sets
of structured and unstructured data.
Demonstrated excellence in using various packages in Python and R like Pandas, NumPy, SciPy,
Matplotlib, Seaborn, TensorFlow, Scikit-Learn, and ggplot2.
Skilled in statistical analysis programming languages such as R and Python (including Big Data
technologies such as Spark, Hadoop, Hive, HDFS, and MapReduce).
Understanding of applying Naïve Bayes, Regression, and Classification techniques as well as
Neural Networks, Deep Neural Networks, Decision Trees, and Random Forests.
Performing EDA to find patterns in business data and communicate findings to the business using
visualization tools such as Matplotlib, Seaborn, and Plotly.
Experience in ECL modelling with their data governance and evaluated portfolios for potential
sale to ARCs.
Adept at applying statistical analysis and machine learning techniques to live data streams from
big data sources using PySpark and batch processing techniques.
Leading teams to produce statistical or machine learning models and create APIs or data pipelines
for the benefit of business leaders and product managers.
Experience in Tracking defects using Bug tracking and Version control tools like Git.
Strong experience in interacting with stakeholders/customers, gathering requirements through
interviews, workshops, and existing system documentation or procedures, defining business processes,
identifying, and analyzing risks using appropriate templates and analysis tools.
Good knowledge of creating visualizations, interactive dashboards, reports, and data stories using
Tableau and Power BI.
Excellent communication, interpersonal, intuitive, analysis, and leadership skills, a quick starter
with the ability to master and apply new concepts.
Large Language Model fine tuning and training. Extensive experience hands on with PaLM and
Open AI Davinci and GPT 2,3,3.5 and GPT-4
TECHNICAL SKILLS
Analytic Development - Python, R, Spark, SQL, PySpark
Python Packages - Numpy, Pandas, Scikit-learn, TensorFlow, Keras, PyTorch, Fastai, SciPy, Matplotlib,
Seaborn, Numba
Artificial Intelligence - Classification and Regression Trees (CART), Support Vector Machine, Random
Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, Regression, Bayesian Models.
Natural Language Processing - Text analysis, classification, chatbots, NER, NLTK
Deep Learning - Machine Perception, Data Mining, Machine Learning, Neural Networks, TensorFlow, Keras,
PyTorch, Transfer Learning.
LLMs – GPT, BERT, BART.
Terraform – Infrastructure as code (IAC) using Puppet, Ansible.
Programming Tools and Skills -Jupyter, RStudio, GitHub, Git, APIs, Docker, XML, Kubernetes, Back-End,
Databases, Flask, HTML, MS Azure, AWS, GCP, Azure Databricks, AWS SageMaker
Data Modeling - Bayesian Analysis, Statistical Inference, Predictive Modeling, Stochastic Modeling, Linear
PROFESSIONAL EXPERIENCE
Modeling, Behavioral Modeling, Probabilistic Modeling, Time-Series analysis
Machine
Lead DataLearning
Scientist- Natural Language Processing and Understanding, Machine Intelligence, Machine
Learning algorithms.
Enhance IT, Mumbai
Analysis Methods - Forecasting, Multivariate analysis, sampling methods, Clustering Predictive, Statistical,
Since JuneExploratory,
Sentiment, 2023 and Bayesian Analysis. Regression Analysis, Linear models,
Applied Data Science
As a dedicated ML Engineer,- NaturalI Language Processing,
applied my skills to Predictive Maintenance,
computational biology Chatbots, Machine
at Genentech. Learning,
I led the
Social Analytics,
development Interactive
of automated Dashboards.
literature search and biological sequence analysis pipelines. Collaborating with the
Head of ML & CTO, we developed scalable methods for automated literature analysis. I also worked on
creating predictive DNA and protein language models, among other sequence-based prediction methods.
Responsibilities:
Defined customer segments based on customer behaviour and demographic through clustering
algorithms and performed analysis to understand their purchase patterns. Also provided actionable
insights to boost the sales by 11%.
Took ECL Model ownership and managed it to ensure its accuracy and timely refresh, leading to
precise financial insights.
Demonstrated proficiency in asset classification, maintaining a track record of error-free processes to
uphold the company’s financial integrity.
Liaised effectively with the Finance department, providing valuable insights related to ECL,
contributing to informed decision-making.
Spearheaded the development of a robust Data Governance operating model, taking charge of related
projects to enhance data quality and compliance.
Effectively collaborated with external entities such as Stat Auditors, RBI, and CAG, ensuring
successful audits and regulatory compliance.
Addressed data queries pertaining to asset quality and ECL promptly and accurately, fostering a data-
driven decision-making environment.
Evaluated portfolios for potential sale to ARC (Asset Reconstruction Company), demonstrating
strategic thinking and financial acumen.
Developed a model for Fault detection in Devices deployed. Used logistic regression, SVM, Random
Forest, XGBoost and ADABoost to achieve 92% accuracy on test data.
Accessed the production SQL database to extract data for validation with third-party data.
Validated data between SQL servers and third-party systems.
Worked with large datasets (10M+ observations of text data).
Cleaned text data using a variety of techniques.
Integrated with the AWS platform environment.
Utilized cloud computing resources to optimize models, tune hyperparameters, and cross-validate
statistical data science models.
Used Python libraries Pandas, NumPy, Seaborn, Matplotlib, and SciKit-learn to develop various
machine learning models, including logistic regression, random forest, gradient boost decision tree, and
neural network.
Built and analysed datasets using Python and R.
Applied linear regression in Python and SAS to understand the relationships between different
attributes of the dataset and the causal relationships between them.
Performed exploratory data analysis (EDA) on datasets to summarize their main characteristics, such as
a bag of words, K-means, and DBSCAN.
Utilized Git for version control on GitHub to collaborate with team members.
Built solutions using LLMs (as an individual contributor) within Transformers, BERT (Encoder), GPT
(Decoder) and BART (Encoder-Decoder).
Spend Optimizer – Designed penetration attribution model using LGBM to determine the impact of
different spends like media, CP etc and help the client in ROI optimization.
Perfect Store Analysis – Developed perfect store analysis framework using Bayesian models to
determine the contribution of different perfect store KPIs into overall sales for a store and identify the
top KPIs driving sales.
PnP Modelling – Developed regression models for different products (SKUs) to generate the
elasticities for pricing and promotion KPIs.
Implemented models to predict previously identified Key Performance Indicators (KPIs) among all
attributes.
Key outcomes - Increased customer retention by 52% by using precision as metric for retention
prediction.
Developed several ready-to-use templates of machine learning models based on given specifications
and assigned clear descriptions of purpose and variables to be given as input into the model.
ML Ops Engineer
Shell
Jan 2022 – May 2023
As an MLOps Engineer, I was responsible for designing and building cloud solutions to support the activities
of machine learning engineers across the company. I built tools and APIs to support ML teams in every stage of
their ML workflows and championed automation of the entire ML lifecycle, covering data ingestion, model
development, model training, model management, deployment, serving, and monitoring. Daily, I prototyped
tools and APIs to allow ML engineers to access cloud-based infrastructure developed by my MLOps
colleagues, iterating towards a production-ready service. In developing solutions, I carefully considered the
skillset of the end user and designed and documented tools accordingly. I also set up batch process Inference,
Model Monitoring, and Retraining Pipelines using AWS Sage maker and ML Flow model registry.
Responsibilities:
Day ahead Power price forecasting for Shell Energy Singapore (SES) with revenues increased by SGD
700k weekly. Awarded with Shell AI innovator to achieve the business outcomes by potential increase
in value capture with better tolling decisions.
Assist in catching up the sharp market movements and synergies with other optimization efforts by
SES team.
SEPH Model – Used XGBoost to empower business to forecast power prices in Philippines for next 3
months and enhanced the revenue bucket by 18 million PHP.
SK Modelling – Built regression models for LNG forecast in South Korea in terms of daily, monthly,
and yearly. Deployed the models in SLMT platform at Azure.
Integrated ML models with Kubernetes-based infrastructure, leveraging EKS for efficient deployment
and scaling of models in production environments.
Designed and implemented a comprehensive ML workflow leveraging MLFlow, AWS Batch, Docker,
and Kubernetes, resulting in streamlined and scalable model training and deployment processes.
Developed custom Docker containers for ML model packaging, ensuring consistent and reproducible
environments across different stages of the ML lifecycle.
Utilized MLFlow to track and manage experiments, enabling easy comparison of different models and
hyperparameters, and facilitating collaboration among team members.
Implemented automated model training pipelines using AWS Batch, enabling efficient parallel
processing of large datasets and reducing training.
Conducted comprehensive training sessions for ML team members, equipping them with the necessary
skills to proficiently utilize MLOps tools and technologies for efficient ML model management and
deployment.
Developed and maintained documentation and best practices for ML operations, ensuring knowledge
sharing and smooth onboarding of new team members.
Designed and implemented continuous integration and continuous deployment (CI/CD) pipelines using
Jenkins and GitLab, enabling automated testing, building, and deployment of ML models. This
streamlined the development process and ensured consistent and reliable delivery of models.
Automated model monitoring using ML-Flow and Weights and Biases.
Utilized AWS SageMaker and Lambda to conduct thorough performance tuning and optimization of
ML models, resulting in significant improvements in inference speed and cost efficiency.
Designed and implemented a containerized ML model deployment solution using Docker and
Kubernetes, ensuring efficient resource utilization and seamless scalability for handling high-volume
inference requests.
Architected and implemented a robust and scalable infrastructure leveraging AWS Elastic Kubernetes
Service (EKS) and Docker containers to efficiently orchestrate and manage the deployment of machine
learning models. Incorporated fault-tolerant mechanisms to ensure continuous availability and
optimized resource allocation for optimal performance.
Collaborated with DevOps teams to integrate ML workflows into existing CI/CD pipelines, enabling
seamless deployment and version control of ML models.
Designed and implemented scalable data pipelines using AWS Glue and Athena to facilitate seamless
data ingestion, transformation, and storage. Collaborated with cross-functional teams to ensure efficient
and reliable data processing and analysis, resulting in improved data-driven decision-making.
Sr Data Scientist
Coforge
Aug 2019 – Jan 2022
As a data scientist, I worked with a team of data scientists, data engineers, and ml-ops engineers to create
deployable machine learning models to detect fraudulent claims and identify anomalies in medicine use. The
tools I employed were anomaly detection algorithms such as convolutional autoencoders and isolation forests.
The goal of the project was to reduce the payout to fraudulent claims and to identify possible medication abuse
early by careful examination of the historical records.
Responsibilities:
Resolved optimization problems in OTP, crew scheduling and Gates optimization.
Performed parameter tuning using grid search, feature selection and model evaluation in different
scenarios.
Rapid Data Science Experiments leading to products with datasets of size GBs/TBs Improved merchant
prediction by 2%, by developing new solutions adapting open-source software.
Improved transaction level granular geo details population by 11% by developing a data driven
module.
This merchant prediction module processes ~1mn and geo module processes ~8.8mn transactions
daily.
Reduced cost of ‘Store Id Identification’ module by ~4x which results in saving 60 USD per day or 30k
USD per annum.
Utilized clustering-based outlier detection algorithms like CBLOF and Angle-Based Outlier Detectors
to identify anomalies in medicine use patterns.
Developed and maintained data pipelines to ensure the timely and accurate ingestion of data for
anomaly detection.
Implemented proactive monitoring and maintenance protocols to ensure optimal performance and
effectiveness of deployed machine learning models.
Designed and implemented machine learning models utilizing advanced anomaly detection algorithms,
including Isolation Forest, Local Outlier Factor, and One-class Support Vector Machine, to detect
fraudulent claims and identify anomalies in medicine use.
Conducted comprehensive model evaluation and validation, utilizing performance metrics such as
precision, recall, and F1-score, to ensure the robustness and effectiveness of the implemented anomaly
detection algorithms.
Engaged in extensive collaboration with subject matter experts to gain a deep understanding of the
business requirements and effectively integrate domain knowledge into the development and
implementation of anomaly detection models.
Conducted extensive feature engineering to extract meaningful features from the medical claims data
for improved model performance.
Leveraged the Histogram-based Outlier Score algorithm to develop a robust system for accurately
detecting and flagging fraudulent claims, as well as identifying early signs of medication abuse.
Leveraged convolutional autoencoders, a deep learning technique, for accurate and efficient anomaly
detection in the context of fraudulent claims and medication abuse identification.
Data Scientist
SunLife
Mar 2016 – July 2018
Worked as a model developing engineer - Built churn analysis models as well as market segmentation and
customer lifetime value estimation. As a junior Data Analyst, I extracted insights from the existing datasets
and prepared data for further analysis as part of a team.
Responsibilities:
Conducted sentiment analysis on customer feedback data to identify key drivers of customer
satisfaction and implemented targeted improvement initiatives.
Developed and deployed cutting-edge customer segmentation algorithms leveraging advanced data
analytics techniques to optimize the allocation of the marketing budget and enhance precision in
targeting, resulting in a substantial 20% reduction in marketing expenditures.
Designed and implemented robust data pipelines and databases, leveraging SQL, Python, and Hadoop
technologies, to ensure the integrity and reliability of data for analysis purposes.
Designed and executed A/B testing experiments to assess the impact of marketing campaigns and
optimize conversion rates, leading to a notable 15% improvement in campaign return on investment
(ROI).
Collaborated with cross-functional teams to develop and deploy recommendation systems, improving
personalized customer experiences and increasing upsell opportunities.
Led a cross-functional team in designing and executing customer segmentation analysis, resulting in
targeted marketing campaigns and a 25% increase in customer engagement.
Led cross-functional teams in defining project objectives, gathering data requirements, and developing
analytical solutions for market research and customer lifetime value analysis.
Stayed up to date with the latest advancements in data science and machine learning technologies to
continuously improve analytical capabilities.
Implemented data visualization techniques to present complex findings clearly and concisely to
stakeholders.
Applied natural language processing techniques to analyse customer feedback and sentiment analysis
for product improvement.
Designed and executed market segmentation analysis to identify distinct customer segments based on
demographic, behavioural, and psychographic characteristics.
Conducted market segmentation analysis to identify distinct customer groups and tailor marketing
strategies accordingly.
Delivered comprehensive reports and presentations to senior executives, highlighting key findings and
actionable recommendations based on data analysis.
Utilized A/B testing methodologies to assess the impact of marketing campaigns on customer
behaviour and provided data-driven recommendations for optimizing future initiatives.
Collaborated with cross-functional teams to define project objectives, gather data requirements, and
develop analytical solutions.
Built customer lifetime value estimation models to predict future revenue potential and inform
customer acquisition and retention efforts.
Collaborated with cross-functional teams to define project objectives, gather data requirements, and
develop analytical solutions.
Data Analyst
American Express
July 2014 – Mar 2016
Worked as a data analyst - Built scorecards for Underwriters and extracted insights from the existing
datasets and prepared data for further analysis as part of a team.
Responsibilities:
Built Underwriting Scorecard using SAS EG and published weekly reports through Tableau and SAS.
Spend Analytics of GSM in terms of Dashboards, CBA and other visuals through Tableau, Access.
Data Analyst
SMC Global
Jan 2014 – July 2014
Worked as a data analyst – Daily and weekly margin reports for HNI Clients.
Responsibilities:
Delivered weekly MIS for HNI clients in NBFCs through Excel, SQL.
Created daily reports to find margin values and net worth of existing clients in capital market.
Data Analyst
Genpact
Feb 2012 – Aug 2012
Worked as a data analyst – Daily and weekly margin reports for HNI Clients.
Responsibilities:
Invoice processing and maintaining dashboards for Amex global data using SQL and Excel.
ACADEMIC CREDENTIALS
Master of Business Administration – Data Science
NIIT University
Bachelor of Commerce – Accounting, Economics & Statistics
Delhi University
LICENSES & CERTIFICATIONS
Microsoft Certified: Data Analyst Associate
Microsoft Certified: Power Platform Associate
Microsoft Certified: Azure Data Fundamentals
Microsoft Certified: Azure Data Engineer