0% found this document useful (0 votes)
73 views2 pages

Resume K

Res

Uploaded by

karth.abbyanish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views2 pages

Resume K

Res

Uploaded by

karth.abbyanish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

KOHISHA ARUGANTI

Washington, DC Ó +1 703-579-7117 varuganti.kohisha@gmail.com ¯ LinkedIn ‡ GitHub

Professional Summary
• Data Engineer with over 5 years of hands-on experience in developing and optimizing data solutions across various industries.
• Expertise in data modeling, warehousing, and ETL pipelines development to drive business growth and efficiency in large-scale, fast-paced
environments.
• Experienced in AWS solutions such as EC2, DynamoDB, S3, and Redshift, with strong hands-on skills in data visualization using Tableau
and QuickSight.
• Skilled in leveraging SQL, Python, and data mining techniques to analyze complex datasets and generate actionable insights for business
planning and decision-making.
• Proficient in infrastructure automation using Terraform and running CI/CD pipelines in Azure DevOps, ensuring seamless deployment of
infrastructure and continuous integration workflows.
• Adept at collaborating with stakeholders to align data strategies with business objectives, driving impactful results.

Work Experience
• Big Data Engineer | Celebal Technology Solutions | Washington, DC Mar 2024 - Present
◦ Integrated data from multiple sources into AWS S3 via Fivetran, managing JSON files and optimizing the pipeline using
Medallion Architecture, improving data processing efficiency by 30%.
◦ Automated the provisioning of AWS infrastructure and Databricks resources using Terraform, reducing manual effort by 40%
and accelerating deployment time by 50%.
◦ Developed and maintained CI/CD pipelines using Azure DevOps, automating infrastructure deployment and
version-controlled code management, increasing deployment efficiency by 35%.
◦ Created and deployed Databricks Asset Bundles, replicating notebooks across environments with version control, ensuring 100%
consistency across development, staging environments.
◦ Utilized Serverless SQL to reduce infrastructure costs by 30%, and optimized data processing workflows to ensure cost-effective
resource usage.
◦ Enhanced decision-making capabilities and operational efficiency, leading to a 40% increase in campaign success and a 20%
reduction in excess inventory.
◦ Managed data governance with AWS Secret Manager and VPC, ensuring compliance and protecting against data breaches.
◦ Developed Data transformation scripts and used Databricks to harmonize data formats and resolve schema inconsistencies,
ensuring seamless integration.
◦ Addressed additional client needs by integrating new reporting features and refining data processing based on feedback.
• Data Analyst | George Washington University | Washington, DC Jun 2022 - Dec 2023
◦ Developed a machine learning-based legal document categorization system using Python and advanced NLP techniques, aimed at
enhancing legal research efficiency.
◦ Processed and standardized diverse legal document data with Python libraries NLTK and SpaCy, preparing high-quality inputs
for model training.
◦ Engineered features using TF-IDF and word embeddings, applying Scikit-Learn for effective feature selection and model
development.
◦ Built and optimized classification models, including Naive Bayes, SVM, LSTM, and BERT, utilizing TensorFlow, Keras, and
Scikit-Learn.
◦ Achieved 92% classification accuracy, addressing complex legal text challenges and setting a new benchmark for document
categorization.
◦ Designed Tableau dashboards for visualizing results and compiled detailed reports, offering actionable insights to improve
document management.
◦ Automated data ingestion and model updates, establishing a scalable system capable of continuous learning and integration with
legal databases.
◦ Enhanced operational efficiency by 30%, significantly improving document retrieval speed and accuracy, and providing a valuable
tool for future legal research.
• Program Analyst | Cognizant Technology Solutions | Bangalore, India Nov 2019 - Jan 2022
◦ Processed and analyzed large-scale healthcare data for the Sanofi Group to generate daily reports for internal teams and external
stakeholders, ensuring data accuracy and timely delivery.
◦ Implemented real-time data ingestion using Apache Kafka, reducing data processing latency by 30% and improving data
streaming efficiency.
◦ Built custom batch data processing pipelines with AWS Glue, tailored to meet project-specific needs, improving data processing
speed and efficiency.
◦ Orchestrated and monitored data workflows using Apache Airflow, incorporating alert systems to reduce job failure resolution
time by 25%.
◦ Leveraged Apache Spark for high-performance data processing and transformation, achieving a 20% reduction in processing time,
with secure data storage in Amazon S3 and Snowflake.
◦ Built Power BI dashboards for operational reporting, resulting in a 15% increase in decision-making accuracy for the client.
Certifications
• Azure Data Engineer Associate
• AWS Certified Solutions Architect Associate
• Azure Data Fundamentals
• Databricks Accredited Apache Spark Programming
• Databricks Accredited AWS Platform Architect
• Databricks Accredited Platform Administrator

Skills
Version Control: Git, Azure Repos, GitHub
CI/CD: Azure DevOps (Pipelines, Repos)
Infrastructure as Code: Terraform, ARM Templates, CloudFormation
Programming Languages: Python, PySpark, SQL, R, MATLAB, Git
AI & Data Science Libraries: Pandas, Scipy, NumPy, Scikit-Learn, Matplotlib, Plotly, Seaborn, Keras, TensorFlow, PyTorch
ETL Tools: Apache Airflow, Glue
Data Visualization: Tableau, Power BI, AWS QuickSight
Big Data Ecosystem: Spark, Kafka
Cloud Environment: Azure, Amazon Web Services(S3, Redshift, DynamoDB)
Azure: Databricks, Data Lake, Blob Storage, Azure Data Factory, SQL Data Warehouse
Databases: Redshift, SQL Server, NoSQL

Projects
Hate Speech Analysis

Python, NLP, Deep learning, Machine Learning
◦ Identified the growing problem of hate speech on online platforms, which spreads negativity and causes harm to individuals and
communities.
◦ Acknowledged the challenges faced by existing solutions in accurately detecting and understanding the context of hate speech.
◦ Implemented robust preprocessing, including data cleaning, normalization, and feature engineering.
◦ Utilized Word2Vec, GloVe, and TF-IDF for effective word embedding, enhancing the model’s ability to understand context and
semantics.
◦ Employed a combination of CNN-LSTM and BERT to further enhance classification performance.
◦ Developed a dynamic data visualization tool with Streamlit, allowing users to interactively explore and analyze the model’s results.
◦ Enhanced model interpretability using the LIME Algorithm, providing clear insights into the decision-making process.

Netflix Data Analysis



AWS, Python, SQL, S3
◦ Collected and scraped Netflix data through APIs, creating a comprehensive dataset and automating data integration with AWS
and SQLite, enhancing operational efficiency.
◦ Reduced data processing time by 15% and identified viewer trends through exploratory analysis, leading to targeted content
recommendations.
◦ Developed an interactive Tableau dashboard for stakeholders to analyze key metrics and track performance indicators effectively.

Education
George Washington University Washington, DC

Master of Science in Data Science Dec 2023
SRI VENKATESHWARA ENGINEERING COLLEGE India

Bachelor of Technology in Computer Science Sep 2020

You might also like