a
Solutions | Analytics | Consulting
Data Scientist – NLP
Company Profile Solytics Partners provide products and services to BFSI and Healthcare firms. We use
AI/ML & cutting-edge technology to develop next generation solutions or provide
efficient services. We have strong team of PHD's in AI/ML and experts in BFSI, and
healthcare industry. Our regulatory compliant solutions and services enable leading
corporations and institutions to create and sustain competitive advantage.
Job Title Data Scientist – NLP
Location Pune (eventually), Remote to start with
Experience 3-6 Years
Education Bachelor's or Master’s in Computer Science Engineering or a related discipline is required.
Qualification Data Science certifications would be a plus.
Role Type Permanent
Job Description We are looking for a Data Scientist with strong NLP expertise. In this role, you will lead a
range of data analytics efforts, monitor and improve the performance of our Machine
Learning and Natural Language Understanding (NLU) models
Responsibilities In-depth data analysis: Extract data to manipulate/calculate/format/combine into
presentable reports, charts, and graphs. Analyze and interpret data to find outliers,
understand root cause, business impact, correlations/discrepancies, and propose
changes/alternate solutions
Discover patterns/root causes, and generate insights to drive product enhancements
Bring together disparate data sources to create a complete analysis.
Analyze and evaluate the quality of data used for model training and testing
Create and present proposals and results in an intuitive, data-backed manner, along
with actionable insights and recommendations to drive business decisions
Collaborate with other data scientists and engineers on data collection and feature
design efforts across teams
Communicate results to diverse audiences through effective writing and data
visualizations (BI reports and Dashboards)
Desired Skills Mandatory
Solid experience with Natural Language Processing (NLP)
Text Extraction from various sources (MS Word, plain text files, pdf files,
html pages, etc.), Text Cleaning, Text Pre-Processing, Tokenization, POS
tagging, NER, Dependency Parsing, Coreference Resolution, Feature Vector
Generation (binary, count, tf-idf, etc.), word2vec, doc2vec, glove, RAKE,
document similarity (Cosine, Jaccard, etc.), fuzzy text matching, Lexical and
Semantic Information Extraction
Understanding of various NL constructs like Parts of Speech, Sentence
structures, Subject – Verb – Object relationships, word dependencies
(ROOT, compound, etc.)
Strong expertise in Python
Expert-level skills with packages like NLTK, spaCy, genism, Pattern,
TextBlob, Vocabulary, Stanford CoreNLP Python wrappers. Text extraction
tools like PDFMiner, Apache Tika with Python, PyPDF2, etc. pandas,
sklearn, numpy, xgboost, matplotlib, keras, etc.
Expertise in Command Line usage (e.g., Bash), and SQL
Robust knowledge of statistical modelling and machine learning techniques
Techniques: text clustering (k-means, DBSACN, etc.), text classification
(Naïve Bayes, MAXENT, SVM, Tree Based models, other ML & Deep
Learning models)
Experience in analyzing and quantifying data collected through crowd sourcing
protocols
Strong experience with descriptive statistics and visualization tools
a
Solutions | Analytics | Consulting
Experience with data selection methods: identify how to choose which data for which
experimental set ups
Excellent communication and organizational skills with significant attention to detail
Demonstrable track record dealing well with ambiguity, prioritizing needs, and
delivering results in an agile, dynamic environment
Good-to-have
Experience with big data tools (Hive, Pig) and familiarity/experience with AWS
technology - stack (S3, Redshift)
Experience with Deep Learning techniques and methodologies
Experience of working with multi-lingual data and understanding of nuances of
working with different language scripts in NLP