Vaidehi Patole
617-987-4806
vaidehi2551@gmail.com
Summary:
• Sr Data Engineer with 9 years of IT experience in Data warehousing, Business Intelligence in Banking,
Healthcare and Ecommerce domains.
• Expertise in designing and developing scalable Big Data solutions, records warehouse on large-scale
dispensed records, acting extensive variety of analytics to measure service performance.
• Experience working with multiple cloud providers – AWS, Azure, GCP. Good experience with Agile and
Waterfall environments.
• Excellent knowledge of waterfall and spiral methodologies of Software Development Life Cycle (SDLC).
• Developing records pipeline the usage of Sqoop, and MapReduce to ingest workforce into HDFS for analysis.
• Development stage experience in Microsoft Azure providing facts motion and scheduling capability to cloud-
based technology including Azure Blob Storage and Azure SQL Database.
• Experience on AWS cloud services (Amazon Redshift and Data Pipeline).
• Python scripting experience in Scheduling and Process Automation.
• Experience in Hadoop in ingestion, storage, querying, processing, and evaluation of massive data
• Excellent understanding in Migrating servers, databases, and programs from on premise to AWS and Google
Cloud Platform.
• Extensive Knowledge of Big Data, Hadoop, MapReduce, Hive and different rising technologies.
• Experience in performing analytics on data using Hive queries, operations.
• Experience in enforcing Real-Time streaming and analytics the use of various technology i.e., Spark streaming
and Kafka.
• Advanced knowledge of stream processing technologies like Kafka, Spark Streaming, and Flink, allowing me
to build real-time data processing pipelines with minimal latency and high throughput.
• Experience in database design, writing complex SQL queries and stored procedures using PL/SQL.
• Experience with Oozie Scheduler in setting in place workflow jobs with Map/Reduce and Pig jobs.
• Experience of architecture and functionality of NOSQL DB like HBase and Cassandra.
• Experience in designing DB2 Architecture for Modeling a Data Warehouse via way of means of the use of
equipment like Erwin, Power Designer and ER/Studio.
• Experience in numerous Teradata utilities like Fast load, Multiload and Teradata SQL Assistant.
• Knowledge of Star Schema Modeling, and Snowflake modeling, FACT and Dimensions tables, bodily and
logical modeling.
• Excellent technical and analytical skills with know-how to design ER modeling for OLTP and measurement
modeling for OLAP.
• Experience in writing and executing unit, system, integration and UAT scripts in a records warehouse project.
• Expert in producing on-call for and scheduled reviews for enterprise evaluation or control selection the usage
of POWER BI.
• Experience in Created Logic Apps with different triggers, connectors to integrate the data from Workday to
different destinations.
• Experience with data modifications using Snow SQL in Snowflake.
• Experience in developing ETL framework the use of Talend for extracting and processing data.
• Sound information in Data Analysis, Data Validation, Data Cleansing, Data Verification and statistics
mismatch.
• Experience with Tableau in analysis and creation of dashboard and character stories.
• Experience in the use of MS Excel and MS Access to dump the data and examine based totally on business
needs.
• Ability to examine and adapt fast to the rising new technologies.
• Strong hands-on experience using Teradata utilities - BTEQ, Fast Load, Multiload, Fast Export, T pump and
Unix Shell scripting.
• Experience to Data Modeling using Dimensional Data Modeling techniques like Star Schema and Snowflake
Modeling.
• Possess in-depth knowledge of Database Concepts, Design of algorithms, SDLC, OLAP, OLTP, Data marts and
Data Lake.
• Experience in all stages of Software Development Lifecycle (SDLC) – Agile, Scrum, Waterfall methodologies,
right from Requirement analysis to development, testing and deployment.
Technical Skills:
Operating Systems UNIX, MAC OS X, Windows
Programming Languages Python, Java, SQL, PLSQL, Shell scripting
Databases Oracle, SQL Server, Teradata, PostgreSQL, MySQL, MongoDB, Cassandra,
HBase, Redshift, DynamoDB, Elasticsearch, Snowflake
Big Data Eco System Hadoop HDFS, Pig, Hive, Sqoop, Zookeeper, Yarn, Spark, Storm, Impala,
Flume, Kafka, HBase, PySpark, Airflow
Cloud Platforms AWS, EC2, S3, EC3, RedShift, EMR, DynamoDB, Athena, AWS Glue, ECS,
Microsoft Azure, Azure Storage, Data Lake, HDInsight, Databricks, SQL
Database, Snowflake, Snow SQL, Snow pipe.
ETL Tools Informatica Power Center, IDQ, MDM, Pentaho
Version Control Git, Jira
Reporting Tools Power BI, Tableau
Professional Experience:
Client: Elevance Health, VA, Remote Jan 2023 – Present
Role: Sr Data Engineer
Responsibilities:
• Worked closely with managers and end-customers to create and compare business necessities to enforce
shipping of powerful commercial enterprise intelligence and reporting solutions.
• Determined operational goals with the aid of using reading enterprise rules, collecting information,
comparing output, and clarifying necessities and formats.
• Reviewed documentation with developers, clarified any questions or issues, and supplied to the managers for
approval and implementation.
• Used T-SQL in SQL Server Management Studio (SSMS) to expand Stored Procedures, User-Defined Functions
(UDFs) and Views.
• Designed and configured Azure Cloud relational servers and databases analyzing current and business
requirements.
• Experience in developing Pipeline jobs, Scheduling triggers, Mapping records flows the use of Azure Synapse
Analytics and the use of Key Vaults to save credentials.
• Worked on creating tabular models on Azure Analysis Services for assembly commercial enterprise reporting
requirements.
• Design and perform administration functions for Enterprise data integration environments including
Informatica PowerCenter, Master Data Management (MDM), and Informatica Data Quality (IDQ).
• Develop complicated big data ingestion jobs in Talend for relational, large data, streaming, IOT, flat file, JSON,
API, and many other data sources.
• Experience with Azure BLOB and Data Lake storage and loading the data into Azure SQL Synapse Analytics.
• Designed Synapse perspectives the usage of SQL Server Management Studio (SSMS) to extract Data from
numerous reasserts and Load into SQL server database for in addition Data Analysis and Reporting with the
aid of using the use of a couple of transformations.
• Worked on creating correlated and non-correlated queries to clear up complicated business queries related
to multiple tables from one-of-a-kind databases.
• Perform analysis on data quality and apply business rules in all the layers of data extraction, transformation
and loading process.
• Importing & exporting database using SQL Server Integrations Services (SSIS) and Data Transformation
Services (DTS Packages).
• Specialized in reworking information into user visualization to present business users an entire view in their
business usage of Power BI.
• Created and executed multi-threading Pyspark jobs on multi node spark cluster with UDF.
• Used various sources to pull records into Power BI which incorporates SQL Server, Azure cloud.
• Developed several solution driven views and dashboards by developing various chart types consisting of Pie
Charts, Bar Charts, Tree Maps, Circle Views, Line Charts, Area Charts, Scatter Plots in Power BI.
• By the use of Power BI computing device linked to various information sources, operating with distinct
visualizations.
• Involved in Using DataProc, Big Query to develop and maintain GCP cloud base solution
• Developed and posted reviews and dashboards the usage of Power BI and written powerful DAX formulation
and expressions.
• Implemented and optimized distributed computing frameworks like Apache Spark to process large-scale
datasets and accelerate machine learning model training.
• Evaluate and recommend DBA tools and new DBMS technologies/versions.
• Experience with ETL tools such as Pentaho Kettle, Informatica, Talend, Open Refine.
• Performed incremental load with several Data Flow tasks and Control Flow Tasks using SSIS.
• Utilized Power Query in Power BI to Pivot and Un-pivot the data for cleaning and data massaging
• Proficient in at least one of the commonly used ETL or data integration tool like Microsoft’s SSIS, Informatica,
Snap Logic, PL/SQL, or any other open-source tool.
• Created several character roles and groups to the end - users and provided row level protection to them.
• Created Power BI Reports the usage of the Tabular SSAS as source information in Power BI computer and put-
up reviews to service.
• Worked on uploading custom visualizations.
• Created Power BI reports by the use of Joins in a multiple of tables from multiples database the use of
complicated SQL queries.
• Designed a Power BI data version with a couple of reality tables and dimensions depending on the enterprise
requirements.
• Experience in custom visuals and groups advent in Power BI.
• Experience in scheduling refresh of Power BI reports, hourly and on-demand.
• Managed Snap logic servers, Pipelines, and scheduled/Triggered tasks.
• Work with DBA team to implement relational models and assist in performance tuning and release
implementation.
• Apply the Business guidelines on the statistics coming from multiple running companies. Maintain the
business rules in a table for visibility to all required techniques to increase Power BI reports & powerful
dashboards.
• Troubleshot and resolved Power BI issues to ensure optimal functionality.
• Worked on change requests to improve the accuracy and usefulness of Power BI reports.
• Established expertise in developing SQL, DDL, DML and vendor specific data programming languages such as
PL/SQL or Transact SQL.
• Update existing stored procedures, SSIS packages, etc. related to application feature updates.
• Successfully migrated data from various on-prem SQL servers to Azure cloud using copy activity in Synapse
pipelines.
• Configured linked services in Synapse to other servers for seamless data migration.
• Migrated data from APIs to Azure Cloud using bearer tokens for secure and reliable data transfer.
Environment: PL/SQL, Python, Azure Synapse Analytics, Azure Data Bricks, SSIS, SSRS, SSAS Azure-Data factory, Azure
Blob storage, DBA, Azure table storage, DML, Azure SQL server, SQL Server Management Studio (SSMS), Power BI.
Client: USAA, San Antonio, TX Aug 2021 – Dec 2022
Role: Sr Data Engineer
Responsibilities:
• Designed and developed ETL data pipeline at the Hadoop atmosphere for several use cases.
• Set up an AWS Lambda that runs every15 mins to check for repository adjustments and publishes a
notification to an Amazon SNS topic.
• Integrated offerings like Bitbucket AWS Code Pipeline and AWS Elastic Beanstalk to create a deployment
pipeline.
• Created S3 buckets withinside the AWS surroundings to store files, from time to time which might be
required to serve static content material for a web application.
• Configured S3 buckets with various life cycle rules to archive and accessed data to storage classes based on
requirement.
• Built data pipelines using Apache Airflow in GCP Composer environment with various airflow operators like
bash operator, Hadoop operators and Python callable and branching operators.
• Experience in creating and launching EC2 instances using AMI's of Linux, Ubuntu, RHEL, and Windows and
wrote shell scripts to bootstrap instance.
• Used IAM for developing roles, users, groups and implemented MFA to offer extra safety to AWS account and
its resources. AWS ECS and EKS for docker image storage and deployment.
• Used Bamboo pipelines to force all micro-offerings builds out to the Docker registry after which deployed to
Kubernetes, Created Pods and controlled the use of Kubernetes.
• Design an ELK device to screen and seek enterprise alerts. Installed, configured and controlled the ELK Stack
for Log management inside EC2 / Elastic Load balancer for Elastic Search.
• Developed test environments of different applications by provisioning Kubernetes clusters on AWS using
Docker, Ansible, and Terraform.
• Worked on deployment automation of all of the micro services to drag picture from the personal Docker
registry and set up to Docker Swarm Cluster the use of Ansible.
• Installed Ansible Registry for nearby and download of Docker images or even from Docker Hub.
• Create and maintain scalable and fault tolerant multi-tier AWS and Azure environments spanning at some
point of multiple availability zones the usage of Terraform and CloudFormation. Maintained the tracking and
alerting of production and corporate servers the use of Cloud Watch service.
• Worked on scalable allotted statistics machine the use of Hadoop atmosphere in AWS EMR.
• Migrated on premise database shape to Confidential Redshift Data warehouse.
• Wrote numerous data normalization jobs for new data ingested into Redshift.
• Wrote scripts and indexing technique for a migration to Confidential Redshift from SQL Server and MySQL
databases.
• The data is ingested into this software program application by use of Hadoop like PIG and HIVE.
• Worked on AWS Data Pipeline to configure data loads from 53 to into Redshift.
• Used JSON schema to define table and column mapping from S3 statistics to Redshift.
• On demand, steady EMR launcher with custom spark post steps the usage of S3 Event, SNS, KMS and Lambda
function.
• Created EBS volumes for storing documents to be used with EC2 established to them.
• Experienced in developing RDS instances to serve data servers for responding to requests. Automated regular
tasks the use of Python code and leveraged Lambda function wherever required.
• Knowledge on Containerization Management and setup gear Kubernetes and ECS.
Environment: AWS (EC2, S3, EBS, ELB, RDS, SNS, SQS, VPC, Cloud formation, CloudWatch, ELK Stack), Bitbucket,
Ansible, Python, Shell Scripting, PowerShell, ETL, AWS Glue, Jira, JBOSS, Bamboo, Docker, Web Logic, Maven,
Web sphere, Unix/Linux, AWS X-ray, DynamoDB, Kinesis, Code Deploy, Splunk.
Client: Amalgamated Bank, Chicago, IL Jan 2019 – Jul 2021
Role: Data Engineer
Responsibilities:
• Installing, configuring and maintaining Data Pipelines.
• Transforming business problems into Big Data solutions and defining Big Data strategy and Roadmap.
• Designing the business requirement collection approach based on the project scope and SDLC methodology.
• Authoring Python (PySpark) Scripts for custom UDF's for Row/ Column manipulations, merges, aggregations,
stacking, data labeling, and for all Cleaning and conforming tasks.
• Writing Pig Scripts to generate MapReduce jobs and perform ETL procedures on the data in HDFS.
• Develop solutions to leverage ETL tools and identify opportunities for process improvements using
Informatica and Python.
• Make recommendations for continuous improvement of the data processing environment.
• Develop a data platform from scratch and took part in the requirement gathering and analysis phase of the
project in documenting the business requirements.
• Design and implement multiple ETL solutions with various data sources by extensive SQL Scripting, ETL tools,
Python, Shell Scripting, and scheduling tools.
• Used Sqoop to channel data from different sources of HDFS and RDBMS.
• Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and
aggregation from multiple file formats.
• Used SSIS to build automated multi-dimensional cubes.
• Used Spark Streaming to receive real-time data from Kafka and store the stream data to HDFS using Python
and NoSQL databases such as HBase and Cassandra.
• Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary
Transformations and Aggregation on the fly to build the common learner data model and persists the data
in HDFS.
• Validated the test data in DB2 tables on Mainframes and Teradata using SQL queries.
• Automated and scheduled recurring reporting processes using UNIX shell scripting and Teradata utilities such
as MLOAD, BTEQ, and Fast Load.
• Worked on Dimensional and Relational Data Modelling using Star and Snowflake Schemas, OLTP/OLAP
system, Conceptual, Logical, and Physical data modeling.
• Automated the data processing with Oozie to automate data loading into the Hadoop Distributed File
System.
• Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS
Redshift, Oracle, MongoDB, T-SQL, and SQL Server using Python.
Environment: Cloudera Manager (CDH5), Hadoop, Pyspark, HDFS, NiFi, Pig, Hive, S3, Kafka, Scrum, Git, Sqoop,
Oozie, Pyspark, Informatica, Tableau, OLTP, OLAP, HBase, Cassandra, Informatica, SQL Server, Python, Shell
Scripting, XML, Unix.
Client: Capital one, NY Feb 2018 – Jan 2019
Role: Data Engineer
Responsibilities:
• Worked on Azure Data Factory to combine information of each on-prem (MY SQL, Cassandra) and Cloud (Blob
storage, Azure SQL DB) and implemented adjustments to load lower back to Azure Synapse.
• Managed, Configured and scheduled sources throughout the cluster the use of Azure Kubernetes Service.
• Involved in designing data ingestion pipelines on Azure HDInsight Spark cluster the usage of Azure Data
Factory and Spark SQL. Also Worked with Cosmos DB (SQL API and Mongo API).
• Develop dashboards and visualizations to assist enterprise customers examine records in addition to
presenting records perception toto control with a focal point on Microsoft merchandise like SQL Server
Reporting Services (SSRS) and Power BI.
• Developed SQL to put business logic and optimize overall performance of queries with T-SQL queries.
• Ensures compliance with enterprise architectural requirements for data integration offerings which include
business to business application services.
• Creates and owns the strategic roadmap for data integration throughout the enterprise
• Performed the migration of big data units to Data bricks (Spark), created and administer cluster, load data,
configure data pipelines, loading data from ADLS Gen2 to Data bricks using ADF pipelines.
• Created Data brick notebooks to streamline and curate the data for diverse business use instances and
storage on Data brick.
• Provide DBA support for databases in various environments including development, QA, performance, pre-
production for capacity planning, trend analysis, predictive maintenance
• Utilized Azure Logic Apps to build workflows to schedule and automate batch jobs by Integrating apps, ADF
pipelines, and different offerings like HTTP requests, electronic mail triggers etc.
• Worked significantly on Azure data which includes records transformations, Integration Runtimes, Azure Key
Vaults, Triggers and migrating data manufacturing unit pipelines to better Environments the usage of ARM
Templates.
• Ingested data in mini-batches and performs RDD changes on those mini-batches of records with the aid of
using Spark Streaming to carry out streaming analytics in Data bricks.
• Migration of on-premise data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) the
use of Azure Data Factory (ADF V1N2).
• Developed a detailed project plan and helped manage the data conversion
• Performing ETL testing activities like running the Jobs, Extracting the data using necessary queries from
database transform, and upload into the Data warehouse servers.
• Used JIRA for bug tracking and CVS for version control.
Environment: PL/SQL, Python, Azure-Data factory, Azure Blob storage, Azure table storage, Azure SQL server,
Apache Hive, Apache Spark, MDM, Netezza, Teradata, Oracle 12c, SQL Server
Client: IOPEX Technologies, India Jun 2015 – Aug 2017
Role: Data Analyst
Responsibilities:
• Identified and defined the datasets for the report generation by writing queries and stored procedures
• Developed drill through, drill down, linked, sub and parameterized reports for better business analysis using
SSRS.
• Extracted large volume of data from various data sources using SSIS packages
• Designed different types of reports using Report Designer 2008 R2 for financial analysis.
• Worked on Statistical Analysis of data for purchasing of materials and equipment.
• Worked at handling the inventory management by maintaining good safety stock levels.
• Created Auto invoice reports (shipment and backlog reports) and Yearly IT budget reports using Pivot Tables
and Slicers in MS Excel by connecting to SQL server database.
• Wrote Python modules to extract/load asset data from the MySQL source database.
• Wrote and executed various MYSQL database queries from Python using Python -MySQL connector and
MySQL database package.
• Understood the requirements and clarifying the conflicts with the business team.
• Analyzed the existing system, business requirements and functional Specifications.
• Followed coding standards, code versioning and quality process techniques to reduce rework from reviews.
• Tested the application, fixing the defects, and documenting the required information.
• Ensured all quality related activities are logged and shared.
• Communicated the work progress to the onshore team.
• Participated in design discussions and assured functional specifications are delivered in all phases of SDLC in
an Agile Environment
Environment: SSIS Packages, MS Excel, SQL Server, MySQL, Python, Agile
Education:
Masters Northeastern University 2023