KOHISHA ARUGANTI
Washington, DC Ó +1 703-579-7117 varuganti.kohisha@gmail.com ¯ LinkedIn GitHub
Professional Summary
• Data Engineer with over 5 years of hands-on experience in developing and optimizing data solutions across various industries.
• Expertise in data modeling, warehousing, and ETL pipelines development to drive business growth and efficiency in large-scale, fast-paced
environments.
• Experienced in AWS solutions such as EC2, DynamoDB, S3, and Redshift, with strong hands-on skills in data visualization using Tableau
and QuickSight.
• Skilled in leveraging SQL, Python, and data mining techniques to analyze complex datasets and generate actionable insights for business
planning and decision-making.
• Proficient in infrastructure automation using Terraform and running CI/CD pipelines in Azure DevOps, ensuring seamless deployment of
infrastructure and continuous integration workflows.
• Adept at collaborating with stakeholders to align data strategies with business objectives, driving impactful results.
Work Experience
• Big Data Engineer | Celebal Technology Solutions | Washington, DC Mar 2024 - Present
◦ Integrated data from multiple sources into AWS S3 via Fivetran, managing JSON files and optimizing the pipeline using
Medallion Architecture, improving data processing efficiency by 30%.
◦ Automated the provisioning of AWS infrastructure and Databricks resources using Terraform, reducing manual effort by 40%
and accelerating deployment time by 50%.
◦ Developed and maintained CI/CD pipelines using Azure DevOps, automating infrastructure deployment and
version-controlled code management, increasing deployment efficiency by 35%.
◦ Created and deployed Databricks Asset Bundles, replicating notebooks across environments with version control, ensuring 100%
consistency across development, staging environments.
◦ Utilized Serverless SQL to reduce infrastructure costs by 30%, and optimized data processing workflows to ensure cost-effective
resource usage.
◦ Enhanced decision-making capabilities and operational efficiency, leading to a 40% increase in campaign success and a 20%
reduction in excess inventory.
◦ Managed data governance with AWS Secret Manager and VPC, ensuring compliance and protecting against data breaches.
◦ Developed Data transformation scripts and used Databricks to harmonize data formats and resolve schema inconsistencies,
ensuring seamless integration.
◦ Addressed additional client needs by integrating new reporting features and refining data processing based on feedback.
• Data Analyst | George Washington University | Washington, DC Jun 2022 - Dec 2023
◦ Developed a machine learning-based legal document categorization system using Python and advanced NLP techniques, aimed at
enhancing legal research efficiency.
◦ Processed and standardized diverse legal document data with Python libraries NLTK and SpaCy, preparing high-quality inputs
for model training.
◦ Engineered features using TF-IDF and word embeddings, applying Scikit-Learn for effective feature selection and model
development.
◦ Built and optimized classification models, including Naive Bayes, SVM, LSTM, and BERT, utilizing TensorFlow, Keras, and
Scikit-Learn.
◦ Achieved 92% classification accuracy, addressing complex legal text challenges and setting a new benchmark for document
categorization.
◦ Designed Tableau dashboards for visualizing results and compiled detailed reports, offering actionable insights to improve
document management.
◦ Automated data ingestion and model updates, establishing a scalable system capable of continuous learning and integration with
legal databases.
◦ Enhanced operational efficiency by 30%, significantly improving document retrieval speed and accuracy, and providing a valuable
tool for future legal research.
• Program Analyst | Cognizant Technology Solutions | Bangalore, India Nov 2019 - Jan 2022
◦ Processed and analyzed large-scale healthcare data for the Sanofi Group to generate daily reports for internal teams and external
stakeholders, ensuring data accuracy and timely delivery.
◦ Implemented real-time data ingestion using Apache Kafka, reducing data processing latency by 30% and improving data
streaming efficiency.
◦ Built custom batch data processing pipelines with AWS Glue, tailored to meet project-specific needs, improving data processing
speed and efficiency.
◦ Orchestrated and monitored data workflows using Apache Airflow, incorporating alert systems to reduce job failure resolution
time by 25%.
◦ Leveraged Apache Spark for high-performance data processing and transformation, achieving a 20% reduction in processing time,
with secure data storage in Amazon S3 and Snowflake.
◦ Built Power BI dashboards for operational reporting, resulting in a 15% increase in decision-making accuracy for the client.
Certifications
• Azure Data Engineer Associate
• AWS Certified Solutions Architect Associate
• Azure Data Fundamentals
• Databricks Accredited Apache Spark Programming
• Databricks Accredited AWS Platform Architect
• Databricks Accredited Platform Administrator
Skills
Version Control: Git, Azure Repos, GitHub
CI/CD: Azure DevOps (Pipelines, Repos)
Infrastructure as Code: Terraform, ARM Templates, CloudFormation
Programming Languages: Python, PySpark, SQL, R, MATLAB, Git
AI & Data Science Libraries: Pandas, Scipy, NumPy, Scikit-Learn, Matplotlib, Plotly, Seaborn, Keras, TensorFlow, PyTorch
ETL Tools: Apache Airflow, Glue
Data Visualization: Tableau, Power BI, AWS QuickSight
Big Data Ecosystem: Spark, Kafka
Cloud Environment: Azure, Amazon Web Services(S3, Redshift, DynamoDB)
Azure: Databricks, Data Lake, Blob Storage, Azure Data Factory, SQL Data Warehouse
Databases: Redshift, SQL Server, NoSQL
Projects
Hate Speech Analysis
•
Python, NLP, Deep learning, Machine Learning
◦ Identified the growing problem of hate speech on online platforms, which spreads negativity and causes harm to individuals and
communities.
◦ Acknowledged the challenges faced by existing solutions in accurately detecting and understanding the context of hate speech.
◦ Implemented robust preprocessing, including data cleaning, normalization, and feature engineering.
◦ Utilized Word2Vec, GloVe, and TF-IDF for effective word embedding, enhancing the model’s ability to understand context and
semantics.
◦ Employed a combination of CNN-LSTM and BERT to further enhance classification performance.
◦ Developed a dynamic data visualization tool with Streamlit, allowing users to interactively explore and analyze the model’s results.
◦ Enhanced model interpretability using the LIME Algorithm, providing clear insights into the decision-making process.
Netflix Data Analysis
•
AWS, Python, SQL, S3
◦ Collected and scraped Netflix data through APIs, creating a comprehensive dataset and automating data integration with AWS
and SQLite, enhancing operational efficiency.
◦ Reduced data processing time by 15% and identified viewer trends through exploratory analysis, leading to targeted content
recommendations.
◦ Developed an interactive Tableau dashboard for stakeholders to analyze key metrics and track performance indicators effectively.
Education
George Washington University Washington, DC
•
Master of Science in Data Science Dec 2023
SRI VENKATESHWARA ENGINEERING COLLEGE India
•
Bachelor of Technology in Computer Science Sep 2020