Soumyajit Das
Data Science : AI ML | NLP
Kolkata WB - India Mob : +91 8240203702
Email: samuraijack0092@gmail.com
Linkedin : https://in.linkedin.com/in/soumyajit-das-852a4261
A Data Science & Advanced Analytics professional with more than 8 years of experience in providing
data-driven, action-oriented solution to challenging areas in Natural Language Processing,
Sensor Analytics & Computer Vision using Data mining, Statistical model, AI- Machine learning
& Deep learning techniques. A team player with quick learning ability and problem-solving skills.
A Computer Science Engineering graduate offering 8+ years of experience in Core Data Science
with Financial and Manufacturing/ Sensor Analytics domain solutions.
Solution in statistical and AI-ML models for data mining, information extraction/ retrieval/
recommendation and classification, Scoring, Ranking, Optimization.
A keen analyst with knowledge and exposure in driving quality process improvements in business
units through the application of various AI ML Data Science techniques.
A team player with quick learning ability and exceptional analytical and problem-solving skills.
# Accomplishments Across the Career:
Recognized & bagged IP Award in Aug’16 - A new Solution in AI-ML & NLP at
TATA Consultancy Services
Patent: SYSTEMS AND METHODS FOR PREDICTING GENDER AND AGE OF USERS BASED ON SOCIAL
MEDIA DATA. { Pub Number : US20170372206A1 | US15630659 USPTO }
Patent : ADVERSARIAL BANDIT CONTROL LEARNING FRAMEWORK FOR SYSTEM AND PROCESS OPTIMIZATION,
SEGMENTATION, DIAGNOSTICS AND ANOMALY TRACKING
{ WIPO : WO2022003733 | USPTO : 17/635,819 | EPO : 21834552.8 }
Received Spot Award at Technosoft Corporation
# Core Comeptencies: AI-Machine Learning ● Data Science - Advanced Analytics Solutions ●
Predictive Maintenance & Modelling ● Statistical Modelling & Testing ● Analytics on Sensor ● Natural
Language Processing - NLP/NLU ● Deep Learning ● Training & Leadership ● Quick Learning Ability ●
Cross-functional Coordination ● Data Visualization & Reporting ● Client Interactions
TOP Keywords:
● Machine Learning, Deep Learning - CNN, AutoEncoder,
● Supervised learning - Regression, Classification,
● Unsupervised learning model - Clustering/ Segmentation, Scoring/ Ranking, Anomaly / Outlier Detection ,Time
series Forecasting & Analysis
● Feature Extraction, Factor Analysis, Statistical Modeling & Analysis, Hypothesis Testing.
● Predictive Modeling, Predictive Maintenance, Analytics on IOT/Sensor Data.
● Natural Language Processing -NLP/ NLU, Text Classification,Topic Modeling LDA / LSI, Keyword
Extraction, Content Extraction, WordNet, TopicNet, Ontology, Taxonomy Extraction,
Recommendation Network. Knowledge Graph Mining, Graph Clustering, Record Linkage, Attribute
Linkage.
WORK EXPERIENCE :
An Initiative for Start up building : Edge Aanalytics 24 7 ai {2021-2024}
Business Development specialist, IP Monetization, Independent Consultant,
Business meeting with Clients, Understanding business Requirements.
Providing multiple business proposal. Team building & business meetings
Applied Materials [AM] - AMAT Inc - Consultant Data Science Pharma Healthcare IOT
Yield Optimization for chemicals/drugs products, Insight Extraction - Pharmaceutical ccompanies;
Duration :Nov 2019 – Apr 2021 (Sensor | ML use case)
Process Monitoring and prediction of drugs quality to minimize wastage and improve production cycle.
Recommendation for yield improvement.
Relation extraction for different raw material and property of chemical products. System Anomaly detection and
prevention. Improve the structural flow of production and phase level best configuration recommendation.
ML and Deep learning technique for early prediction as supervised learning process. Breaking down complex
hypothesis to generate rule for recommendation.
Anomaly as indicator for wastage and minimization by setting optimal / best fit device and condition for
process.
Crystal Wafer Processing - Assets Diagnostics and recommendation Client: Semiconductor Equipment
manufacturers; Duration: Feb 2020 till May 2020 (Sensor | ML use case)
Improving the process flow and phase level best configuration recommendation
Having expertise in working on ML and Deep learning technique for early prediction as supervised
learning process & system anomaly detection and prevention
Working on breaking down complex hypothesis for generating rule for recommendation
Ticket Summarization and Classification model; Duration Apr 2020 till Oct 2020
A tool for ticket allocation and ticket solution recommendation for Engineering and ITES
team based on document and topic Similarity, ticket allocation to specific group based
on taxonomy. Highlights Key point and recommend similar solution to processes
Engineers. [ML NLP use case]
Cryogenic Pump – Assets Diagnostics and recommendation Client: Applied Material Duration: Aug 2020
till Jan 2021 (Sensor | ML use case)
Improving the material production and phase level best configuration recommendation
Using ML and Deep learning technique for early prediction as supervised learning process & system
anomaly detection and prevention
Working on breaking down complex hypothesis for generating rule for recommendation
Z & A Infotek group {Zen & Art}: Data Scientist : AI ML NLP
Entity Resolution tool for MDM data, Insight Extraction - Zettascence & Zetta Mesh
CRM data MDM :
Nov 2018 - May 2019
Modeling : Entity resolver on different data Type like Customer, Institution data
Clustering records based on Supervised Graph/Edge Learning using ML/Stat model
Reliability Scoring on Record Linkage/Deduplication model
Schema Mapping on String data type using NLP & ML model
Flutura Decision Science & Analytics - Consultant - Data Scientist - AI ML
Crystal Quality (Yield) Optimization, Prediction [Client :Hitachi Hightech partner Kyocera]
Dec 2017 - Oct 2018 (Sensor | IOT & ML use case)
Early prediction of crystal quality to minimize wastage and production cycle [Yield / Time] .
Recommendation for yield improvement. Relation extraction for different raw material and property of crystal.
System Anomaly detection and prevention. Improve the structural flow of production and phase level best
configuration recommendation. ML and Deep learning technique for early prediction as supervised learning
process. Breaking down complex hypothesis to generate rule for recommendation.
Anomaly as indicator for wastage and minimization by setting optimal / best fit device and condition for
process.Pressure pumping - Assets Diagnostics and recommendation Client: Patterson UTI ; Duration: Dec
2017 till Oct 2018 (Sensor | IOT & ML use case)
Improving the oil flow and phase level best configuration recommendation
Having expertise in working on ML and Deep learning technique for early prediction as supervised
learning process & system anomaly detection and prevention
Working on breaking down complex hypothesis for generating rule for recommendation
Assistant Manager : Data Science
CitiBank Global Decision Management, NextGen Data Science Team, CoE
Response model and price sensitivity analysis for Loan on Credit Card
Duration: 8 months
Role: Asst. Manager - Data Science/ Machine Learning & Statistical Modeller
Response model with price sensitivity analysis for Indonesia, Korea and Thailand for Credit Card Lone
on Phone. Recommending model with maximum stability, price sensitiveness and good accuracy.
Analysis of pricing sensitivity in response model and created an pricing recommendation tool with basic
visualization for EMI,EPP & LOP on Citi Credit Card.
Tools/ Technique : R, Python, SQL - Logistic Regression, GBM, Univariate /Multivariate
Analysis,Statistical Hypothesis testing, PCA-Factor Analysis [Statistical modeling, ML usecase]
Merchant Name Extraction (Keyword Extraction) Citi Co-Card
Duration : 4 Months
Tools/ Technique : R, Python, SQL - Dtree, GBM, Naive Bayes
Data : Manually Tagged Short Text SMS [Transaction Update]
Probability scoring for keyword level to identify and extract right Merchant Name from data.
Removing Duplicate Merchant from merchant list based on Merchant Category. Merchant
category/segmentation from merchant description data. [ML NLP usecase]
Tata Consultancy Services {CTO Labs A&I } : Data Scientist - AI ML | Text | Advances Analytics
Duration : Jan 2016 - Mar 2017, Bangalore
Research in NLP/Text mining, Contextual Text classification, Extractive text
summarization, Topic Modeling, Taxonomy extraction, Domain specific Lemmatizer
extraction, Social Media Data analysis, Social Network Analysis.
Computer Vision using Deep Learning and Machine Learning Algorithm.
Patent: SYSTEMS AND METHODS FOR PREDICTING GENDER AND AGE OF USERS BASED ON SOCIAL
MEDIA DATA. { Pub Number : US20170372206A1 / US15630659 , Date : 28/12/2017 (US) }
Projects & POC :
A tool for Social Media data and Network Analysis with some functionalities like user
demographic prediction model, network visualization, sarcasm detection model, social
buzz summarization, favourite topics identification, Influential market leader
identification from social network.[ML NLP usecase]
A tool for technical Article recommendation and ticket solution recommendation for ITES
team with document and topic similarity, Document retrieval and summarization model,
Key point highlight and similar solution recommendation network. [GE HealthCare -
UltraSound] [ML NLP usecase]Pump Engine failure prediction using ML, failure detection, RCA and visualization
tool for part fault indication & Anomaly detection - process & Asset level diagnosis for oil
and gas company. [Shell - Oil Gas IOT Sensor](Sensor & ML use case)
Technosoft Corporation : Data Scientist - ML, NLP
Duration : Sep 2014 - Dec 2015, Bangalore
Developed spotfire based interactive Dashboard for a US Health care Client - Consencio Helth with
Revenue Cycle Management with Actionable Intelligence . Used site base yearly revenue prediction.
Developed tool for System anomaly detection and Root Cause Analysis for fluctuating Revenue Cycle.
Prediction of Customer satisfaction model using Decision tree. [ML, Statistical modeling, Anomaly
detection use case]
[Target Corporation] - September 2015 to December 2015
Improve in business process to reduce the out of stock rate at Item location base.
Guiding the Simulation
process based on time and cost constraints and recommend the feasible
business liver to reduce
out of stock rate. Prediction of out of stock duration based on business process
info at Item location level.[Retail- replenishment & pick, ML & Optimization]
[Dell International] - May 2015 to September 2015
Developed an application
to predict order fallout
from different application log. Measuring overall time
taken by each process
and sequence based
fallout prediction and fallout path detection using
Probabilistic graphical model. Used Statistical models and decision tree for fallout prediction. [Statistical
modeling & ML use case]
Analytics - Data Science Trainee
Duration: Mar 2014 - Jul 2014, Kolkata WB
FIFA , EPL, ECL Foot Ball League [EPL, ECL, Bundesliga, La Liga, FIFA]
Full time/Half Time score - WIN/DRAW/LOSS Rank/prediction and Team Index Reporting [feature wise]
using ML, Statistical Model in R, Python, H20.
Techniques and Packages used :
Text : Information Retrieval, Text Classification, Taxonomy Extraction,
Recommendation engine based on Contextual Mapping.
Extractive Summarization, Domain specific Content Extraction.
NLP Packages/Tools : NLTK, GENSIM, Textblob, RtextTools, Pattern, Rtext-tools, Spacy, Graphml
Machine Learning :
Supervised Learning - Classification, Regression with Regularization,
Tree - Rpart, Chaid
Ensemble model - RF, ExtraTrees, Boosting - GBM, XGBoost
Perceptron - MLP, Kernel- SVM
Unsupervised Learning- K-means, Dendogram, DBScan, TSNE
Temporal Classification : Survival Model KM, COxPH, XGboost
Deep Learning – CNN, Resnet, LSTM, AutoEncoder
ML Packages/Tools : H2O, MLR, MxNet, Tensorflow, Keras, Sklearn
Statistical Modeling : Linear and Logistic regression with regularization - ElasticNet / GlmNet, Time
Series, Multivariate and Bivariate Analysis / Hypothesis Testing, PCA - Factor Analysis / Feature
Clustering, PLS regression.Sensor Analytics: Yield / Quality - Optimization, Optimal Parameter Configuration,
Failure Detection,
System Anomaly & Outlier Detection and Warning System, Stable Operating Mode recommendation,
RCA, Robust Anomaly detection based on Mob, LOF, Isolation Forests.
Python: Pandas, Statmodels, Sksurv, Skimage, Skopt, Matplotlib, seaborn,
XGboost, FactoMiner, SKfeature, Hyperopt.
R: Rattle, Caret, mlbench, Hmisc, Datatable, Dplyr, tidyr, Mass, H2O
Graph analysis packages : Igraph, Gephi.
Java : Lucene, Tika.
SQL server, ProstgresSQL,
Data Visualization : Tibco Spotfire, Tableau
Education& Training ::
• Analytics Specialization - Ivy Pro School [2013- 2014]
• ML & DL Certification - Coursera
• 2009 to 2013 – B.Tech in Computer Sc. Engineering (DSCSDEC [JIS]) [W.B.U.T -
Kolkata]
• 2009 – H.S (Dr. S.P.M.I) [ W.B.C.H.S.E ] (Science)
• 2007 – Class 10 (Dr. S.P.M.I ) [ W.B.B.S.E ]