Susmitha Basineni
+1 (484)-744-1227 | basinenisushmita@gmail.com | LinkedIn
SUMMARY
Data Engineer with 6+ years of experience building scalable data pipelines, real-time streaming solutions, and cloud-native architectures. Proficient in
Python, SQL, Spark, and Kafka, with hands-on expertise in AWS, Azure, and data lake solutions. Strong background in ETL development, data modeling,
and orchestration tools like Airflow. Known for improving data reliability, optimizing workflows, and enabling analytics teams with clean, high-quality
data.
SKILLS
Programming & Scripting: Python (NumPy, Pandas, PySpark), Spark SQL, Shell Scripting
Big Data & Processing Frameworks: Apache Spark, PySpark, Hadoop
ETL & Orchestration: Alteryx, Airflow, AWS Glue
Cloud Platforms: AWS (S3, EC2, Lambda, Redshift, EMR), Azure (Data Factory, Synapse)
Databases & Query Languages: MS SQL, PostgreSQL, MySQL, MongoDB, NoSQL, PL/SQL, T-SQL
Data Warehousing & Storage: Snowflake, Amazon Redshift, Azure Synapse, Data Lakes
Data Visualization & Analytics: Tableau, Power BI, Plotly
Statistics & Analytics: Regression, Classification, Clustering, Hypothesis Testing, Descriptive Analysis
DevOps & Infrastructure: Docker, Kubernetes, Git, Jenkins, Terraform, CI/CD Pipelines
Collaboration & Tools: Jira, Confluence, GitHub, Bitbucket, Postman, Swagger, Agile/Scrum Methodologies
Soft Skills: Problem-Solving, Analytical Thinking, Data Storytelling, Communication, Stakeholder Collaboration, Adaptability, Decision-Making,
Teamwork
PROFESSIONAL EXPERIENCE
Data Engineer, Splunk Sep 2024 - Present | USA
Engineered high-performance data pipelines using PySpark and Apache Spark, reducing data processing latency by 42%, enabling faster decision-
making for business-critical operations.
Designed and automated complex ETL workflows with Apache Airflow and AWS Glue, improving data consistency and availability, which directly
supported operational efficiency across the organization.
Led the development of cloud-native data platforms on AWS (S3, Lambda, Redshift, EMR), cutting infrastructure costs by 23% while ensuring high
performance and scalability for large-scale analytics workloads.
Developed efficient data models in Snowflake, improving query performance by 54%, enabling business users to access insights with greater ease
and speed, and reducing dependency on engineering teams.
Streamlined deployment processes for ETL applications using Docker, Jenkins, and Terraform, accelerating release cycles and improving overall
deployment reliability across environments.
Integrated Hadoop Distributed File System (HDFS) with Splunk’s data lake, enhancing storage capabilities and ensuring secure and reliable handling
of large datasets for in-depth analytics.
Data Engineer, Amazon Aug 2021 - Jul 2023 | India
Built robust, automated data pipelines with Python, SQL, and Alteryx, eliminating manual data processing tasks and saving the team over 500 hours
annually, which allowed a focus on higher-value work.
Optimized query performance in Amazon Redshift, reducing execution times by 3x and ensuring faster, more efficient data access for business
stakeholders across the organization.
Developed and managed automated workflows in Azure Data Factory, reducing data refresh cycles from 12 hours to just 2 hour, and significantly
improving the timeliness and accuracy of reporting.
Integrated diverse data sources, including MongoDB, PostgreSQL, and MySQL, ensuring a seamless data pipeline that improved reporting
consistency and helped achieve 100% accuracy in customer metrics.
Created interactive dashboards with Tableau, providing real-time visibility into key performance metrics, which led to increase in the effectiveness
of marketing campaigns.
Played a key role in documenting and streamlining data flows and system architecture using Jira and Confluence, reducing troubleshooting time
and improving team collaboration.
Data Engineer, Cognizant Jan 2018 - Jul 2021 | India
Designed and implemented high-throughput ETL pipelines using Hadoop, Apache Spark, and Hive, processing over 5TB of data daily, enabling more
efficient and faster access to critical business data.
Enhanced the performance of complex queries by optimizing PL/SQL, T-SQL, and NoSQL queries, reducing report generation times and increasing
overall system responsiveness for clients.
Led the migration of legacy ETL systems to AWS Glue and EMR, improving data processing efficiency by 37% and ensuring that the system could
handle ever-increasing data volumes with ease.
Employed Docker and Kubernetes to containerize and orchestrate ETL services, ensuring consistent performance and 99.99% uptime even during
high-demand periods.
Implemented rigorous data validation processes that reduced data discrepancies by 32%, leading to more reliable reports and better decision-
making for business teams.
Automated time-consuming tasks using Python and Pandas, cutting manual effort by over 120 hours per quarter and significantly increasing the
overall productivity of the data engineering team.
Collaborated with business teams to design tailored data solutions, directly contributing to the success of client-facing products and increasing
customer satisfaction with timely, data-driven insights.
EDUCATION
Masters in Information Technology
Union Commonwealth University, Barbourville, KY Aug 2023 - May 2025
Bachelor in Electrical and Electronics Engineering
Sri Venkateswara Engineering College for Women, Andhra Pradesh, India Aug 2015 - May 2019
CERTIFICATIONS
AWS Certified Data Engineer – Associate
PROJECTS
Scalable Real-Time Data Pipeline
Tools Used: AWS Glue, AWS EMR (Spark SQL), Apache Hadoop, AWS Redshift, AWS Lambda, Apache Spark, SQL (PostgreSQL, MySQL).
Built a serverless ETL pipeline using AWS Glue to automate data extraction, transformation, and loading into AWS Redshift, optimizing analytics
workflows.
Utilized AWS EMR (Spark SQL), Apache Hadoop, and Apache Spark for large-scale distributed data processing, improving query efficiency and
handling high-volume datasets.
Automated workflow execution with AWS Lambda, triggering data processing tasks and ensuring real-time data updates.
Integrated SQL (PostgreSQL, MySQL) for structured data storage and optimized querying performance.
Automated Log Analysis & Security Monitoring
Tools Used: Splunk SPL, Log Analysis, Apache Hadoop, Data Parsing, Indexing, SQL (PostgreSQL, MySQL).
Developed a real-time log analysis system using Splunk SPL, enabling proactive threat detection and faster incident response.
Created custom dashboards and alerts in Splunk, providing actionable insights into security events and system anomalies.
Implemented log parsing, indexing, and leveraged Apache Hadoop to manage and process large volumes of log data from diverse sources.
Conducted risk assessment and compliance reporting using SQL (PostgreSQL, MySQL), ensuring adherence to security policies and regulatory
standards.