bbb
Azure Data Engineer with 5+ years of experience in designing and optimizing data-intensive
applications. Proficient in Spark, Python, and Azure services, with expertise in both on-premises
and cloud architectures. Specialized in pharmaceutical data management, ensuring compliance
and high data quality. Passionate about solving complex design challenges and adapting to new
technologies.
Phone No. +91 999 Email ID – 999@gmail.com LinkedIn - www.linkedin.com/in/ravi-9999- 999/
TECHNICAL SKILLS
• Programming Languages: PySpark | SQL | Python | SparkSQL
• Azure Services: Azure Databricks | Azure Data Factory | Azure Key Vault | Logic APP | Azure
Synapse Analytics
• Azure Data Storage: Azure Data Lake Gen2 | Azure Blob Storage | Azure Delta Lake
• Others: CI/CD | GitHub | Jira | Rest API | Excel | Apache - Airflow
• File Formats: Delta | Parquet | CSV | Excel
ACADEMIC QUALIFICATIONS
Year Degree Institute CGPA
2019 MCA Banaras Hindu University, Varanasi 7.0 / 10.0
WORK EXPPERINCE
Data Engineer at Tiger Analytics Dec - 2024-Present
Project – Skil Upliftment
Key Responsibilities:
Focused on enhancing technical skills in Azure Data Factory, Azure Databricks, and PySpark.
Explored Unity Catalog for data governance and access control in Databricks.
Attended internal learning sessions and self-paced courses to stay aligned with
industry trends and tools in data engineering.
Explored and documented optimization techniques for improving performance of PySpark
jobs and ADF pipelines.
Lead Associate Data Engineer at WNS May 2023-Dec 2024
Project – Healthcare Data
Key Responsibilities:
Developed scalable ETL workflows using Azure Data Factory (ADF) to ingest data from
diverse sources into a centralized Data Lake. Applied Apache Spark in Azure Databricks for
data transformation, optimizing performance and ensuring data consistency with Delta
Lake (ACID-compliant) storage.
Developed optimized SQL scripts to generate Key Performance Indicators (KPIs) from
collected data.
Cleaned and transformed data using Azure Databricks based on business needs.
Worked closely with business analysts and stakeholders to understand and implement
data processing requirements.
Implemented processes to ensure data accuracy and consistency in the Azure Data
Lake Storage Gen2.
Automated Excel file ingestion and processing, streamlining data integration workflows.
Specialized in handling pharmaceutical data, ensuring compliance with industry
standards and data integrity.
Created Python automation scripts to replace manual tasks, improving efficiency.
Data Engineer at Fragma Data Systems June 2019 – April 2023
Project: Enterprise Data Lake Solution
Key Responsibilities:
• Dynamic Ingestion Framework for 20+ Sources into Data Lake orchestrating using ADF and
transforming through Databricks / Synapse Spark with data stored in ACID compliant ADB
Delta format (Snappy Parquet).
• Ingest daily 2000+ tables at different frequency. Having data lake size of
~1petabyte.Written Databricks Code using Delta Tables to Merge incremental Data.
• Applied data cleansing, transformation, and business logic in Databricks for accurate and
consistent data processing.
• Implemented data pipeline optimization techniques, including handling Data Drift in
incremental ingestion with automated detection and resolution.
• Developed a Data Lake Ingestion Framework to ingest data in near real-time, processing
updates every 15 minutes.
•
Project Name: ETL Reengineering Using Job Cluster
Key Responsibilities:
• Migrate Data Lake Ingestion pipelines on Job Cluster with the same efficiency and making it
cost efficient
• After migration of pipeline, have reduced cost 40% per pipeline.
• Migration of more than 70 ingestion pipelines.
Project Name: Bureau Datamart Key
Responsibilities:
• Migration of the SSIS package to the Cloud, to reduce the overall runtime.
• Orchestrating overall migration.
• Making pipeline robust and efficient.
• Making sure data get populated properly.
Project Name: Metrics Calculation
Key Responsibilities:
• Extract Unstructured Data and parse it using JSON schema.
• Perform data processing/Metrics calculation using Spark framework.
• Loading the transform data into the Hive and SQL server database.
PERSONAL PROJECTS Jan 2019 – June 2019
Project : Home Automation Using Arduino Uno
• Home automation is a step toward what is referred to as the “Internet of Things,” in which
everything has an assigned IP address, and can be monitored and accessed remotely.