0% found this document useful (0 votes)
54 views10 pages

Big Data & Cloud Solutions Expert

Resumes

Uploaded by

Mandeep Bakshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views10 pages

Big Data & Cloud Solutions Expert

Resumes

Uploaded by

Mandeep Bakshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

SAHITHI DEVI

+1(313)687-4486 || sahithiredz@gmail.com

PROFESSIONAL SUMMARY:

 Over 10 years of experience in Big Data ecosystems using Hadoop, Pig, Hive, HDFS, Hbase, MapReduce,
Sqoop, Storm, Spark, Scala, Airflow, Nifi, Snowflake, Flume, Kafka, Yarn, Oozie, Zookeeper.
 Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various
distributions such as Apache Hadoop, Cloudera, Hortonworks.
 Experience on Apache Spark using Spark Core, Spark Context, Spark SQL, Spark MLlib, DataFrame, RDD
 Experience in developing Spark Streaming jobs by developing RDD’s (Resilient Distributed Datasets) using
Scala, PySpark and Spark-Shell.
 Extensive experience in Amazon Web Services (AWS) which includes services like EC2, S3, VPC, ELB, IAM,
DynamoDB, Cloud Front, Cloud Watch, Route 53, Elastic Beanstalk (EBS), Auto Scaling, Security Groups,
EC2, ECS, Code Build, Code Deploy, Dynamo DB, Auto Scaling, Security Groups, Red shift, CloudWatch,
CloudFormation, CloudTrail, Ops Works, Kinesis, IAM, SQS, SNS, SES.
 Experienced in ingesting data into HDFS from Relational databases like MYSQL, Oracle, DB2, Teradata, SQL,
Postgres using Sqoop. Experience in various Hadoop file formats like Parquet, ORC & AVRO file.
 Experience in working with NoSQL Databases like HBase, DynamoDB, Cassandra and MongoDB.
 Analyzed data using HQL, PigLatin and extending HIVE and PIG core functionality by using custom UDFs.
 Experience in working with CI/CD pipeline using tools like Jenkins and Chef.
 Hands on experience in setting up workflow using Apache Airflow and Oozie workflow engine for managing
and scheduling Hadoop jobs. Experience in Data warehousing concepts and ETL processes.
 Experience in building Data Models and Dimensional Modeling with 3NF, Star and Snowflake schemas for
OLAP and Operational data store (ODS) applications.
 Experience in designing ETL workflows on Tableau.
 Experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autosys.
 Participated in varying experience levels in building and supporting ETL processes for systems.
 Worked with Cloudera and Hortonworks distributions.
 Experience with Design, code, debug, reporting, data analysis of web applications utilizing Python.
 Experienced with Spark streaming API to ingest data into Spark Engine from Kafka.
 Hands on experience in GCP, Big Query, GCS bucket, G-cloud function, Google Cloud Composer, Cloud
dataflow, Pub/suB cloud shell, GSUTIL, BQ Command line utilities, Data Proc, Stack driver.
 Experience in various Project Management services like JIRA for tracking issues, bugs related to code and
GitHub for various code reviews and Worked on various version control tools like CVS, GIT, SVN.
 Experienced in using IDEs and Tools like Eclipse, Net Beans, GitHub, Jenkins, Maven, and IntelliJ.
 Experience in Shell Scripting, SQL Server, UNIX, Linux, Open stock, and Expertise python scripting.
 Strong experience in writing scripts using Python API, PySpark API and Spark API for analyzing the data.
 Extensively used Python Libraries PySpark, Pytest, PyExcel, Boto3, embedPy, NumPy and Beautiful Soup.
 Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL
Database, Data Bricks and Azure SQL Data warehouse and controlling, granting database access and
Migrating On premise databases to Azure Data Lake store using Azure Data factory.
 Knowledge on OpenShift platform in managing Docker containers using Docker, Kubernetes Clusters.
 Strong experience in working with UNIX/LINUX environments, writing shell scripts.
 Experienced in working in SDLC, Agile and Waterfall Methodologies.
 Experience working with GitHub, Jenkins, and Maven.
 Conducted comprehensive analysis and optimization of SAP tables, creating efficient CDS views that
streamlined data access and reduced redundancy, leading to more accurate and timely business insights.
 Integrated Collibra DGC using Collibra Connect(MuleESB) with third-party tools such as Ataccama, IBM IGC
and Tableau to apply DQ rules, import technical lineage and to create reports using the MetaData in Collibra
DGC.
 Integrated Ataccama with Collibra using MuleESB connector and publish DQ rule results on Collibra using
REST API calls.
 sSuccessfully integrated ABAP CDS views with SAP Fiori applications,
enabling real-time data visualization and interactive reporting for end-users,
which improved decision-making processes and user satisfaction.
 Implemented complex ABAP Core Data Services (CDS) views to enhance
data modeling and reporting capabilities.

TECHNICAL SKILLS:

 Big Data Ecosystem: HDFS, MapReduce, Yarn, Spark, Hive, Impala, Stream Sets, Sqoop, HBase, Pig, Oozie,
Zookeeper, Azure, Amazon web services (AWS), EMR.
 Hadoop Distributions: Apache Hadoop 2.x/1.x, Cloudera CDP, Hortonworks HDP
 Programming Languages: Python, Scala, Java, R, JavaScript, Shell Scripting, Pig Latin, HiveQL.
 NoSQL Database: Cassandra, MongoDB.
 Database: MySQL, Oracle, MS SQL SERVER, PostgreSQL, DB2.
 Cloud Technologies: AWS (EMR, EC2, RDS, S3, Athena), Microsoft Azure, GCP.
 ETL/BI: Informatica, SSIS, SSRS, SSAS, Tableau, Power BI.
 Web Development: Spring, J2EE, JDBC, .Net MVC, Tomcat, JavaScript, Node.js, HTML, CSS.
 Operating systems: Linux (Ubuntu), Windows (XP/7/8/10)
 IDE: IntelliJ, Eclipse, Spyder, Jupyter
 Others: Machine learning, Spring Boot, Jupyter Notebook, Jira, Service Now

PROFESSIONAL WORK EXPERIENCE:

Verizon, TX May 2023 – Present


Sr. Big Data Engineer

Key Responsibilities:

● Designed and executed an extensive data migration plan to shift data from Hadoop to Azure, utilizing Azure
Data Factory for streamlined and automated data transfer operations.
● Utilized Azure Data Lake Storage Gen2 as the designated repository for the migrated data, ensuring scalability,
data security, and robust availability to meet the demands of large-scale data storage requirements.
● Integrated Azure Databricks into the migration process to handle data transformation and processing tasks,
capitalizing on its scalable computational resources and collaborative analytics environment.
● Utilized Azure Data Factory's Copy Activity functionality to coordinate the transfer of data from Hadoop
Distributed File System (HDFS) to Azure Data Lake Storage Gen2, ensuring integrity of data migration
process.
● Leveraged Azure Data Factory's Data Management Gateway to establish seamless connectivity and facilitate
data transfer between on-premises Hadoop clusters and Azure cloud services.
● Implemented Azure Data Factory's Data Flows to execute intricate data transformations and manipulations
during the migration, ensuring compatibility and optimization for Azure data storage solutions.
● Expert in using Databricks with Azure Data Factory (ADF) to compute large volumes of data.

● Performed ETL operations in Azure Databricks by connecting to different relational database source systems
using ODBC connectors.
● Developed automated process in Azure cloud to ingest data daily from web service and load into Azure SQL
DB.
● Deployed data replication and synchronization mechanisms across Azure Cosmos DB to ensure continuous
availability, disaster recovery preparedness, and global data dissemination across multiple Azure regions.
● Developed Streaming pipelines using Azure Event Hubs and Stream Analytics to analyze data for dealer
efficiency and open table counts for data coming in from IOT enabled poker and other pit tables.
● Analyzed data where it lives by Mounting Azure Data Lake and Blob to Databricks.

● Used Logic App to take decisional actions based on the workflow.

● Implemented Azure Data bricks clusters, python and pyspark notebooks, jobs, and auto scaling.

● Performed data cleansing and applied transformations using Databricks and Spark data analysis.

● Used Azure Synapse to manage processing workloads and served data for BI and prediction needs.

● Developed Spark Scala scripts for mining data and performed transformations on large datasets to provide real-
time insights and reports.
● Designed and automated Custom-built input adapters using Spark, Sqoop, and Oozie to ingest and analyze data
from RDBMS to Azure Data Lake.
● Developed automated workflows for daily incremental loads, moved data from RDBMS to Data Lake.

● Configured Spark streaming to receive real data from Apache Kafka to store stream data to HDFS using Scala.

● Involved in building an Enterprise Data Lake using Data Factory and Blob storage, enabling other teams to
work with more complex scenarios and ML solutions.
● Used Azure Data Factory, SQL API, and Mongo API and integrated data from MongoDB, MS SQL, and cloud
(Blob, Azure SQL DB).
● Designing the distribution strategy for SAP tables.

● Extensive knowledge in Data transformations, Mapping, Cleansing, Monitoring, Debugging, performance


tuning, and troubleshooting Hadoop clusters.
● Managed resources and scheduling across the cluster using Azure Kubernetes Service.

● Facilitated data for interactive Power BI dashboards and reporting purposes.


Environment: Azure (HDInsight, Databricks, Data Lake, Blob Storage, Data Factory, SQL DB, SQL DWH, AD,
AKS), Scala, Python, PySpark, Hadoop 2.x, Spark v2.0.2, NLP, Airflow v1.8.2, Hive v2.0.1, Sqoop v1.4.6, HBase,
Oozie, Talend, Cosmos DB, MS SQL, MongoDB, Apache Kafka, AWS, Ambari, Power BI, Azure DevOps.

Bank of America, NC Apr 2021 – May 2023


Senior Big Data Engineer

Key Responsibilities:

● Conducted extensive data preprocessing on AWS, encompassing tasks such as feature scaling, normalization,
and handling missing values to prepare datasets for model training and assessment.
● Orchestrated end-to-end machine learning pipelines on AWS, involving data ingestion from sources like
Amazon S3, data processing using AWS Glue, model training with Amazon Sage Maker, and model
deployment via AWS Lambda and Amazon ECS.
● Integrated PySpark with machine learning libraries such as scikit-learn and TensorFlow to execute sophisticated
data transformations and feature engineering tasks for predictive modeling purposes.
● Utilized PySpark's Data Frame API to execute intricate data transformations, including joins, aggregations, and
window functions, thereby ensuring the integrity and accuracy of the data.
● Implemented custom User Defined Functions (UDFs) in PySpark to address specific business logic
requirements, thereby enhancing the adaptability and scalability of ETL processes.
● Developed specialized data ingestion connectors for Snowflake, leveraging Snow park and Snowflake
Connector for Python to expand data integration capabilities and handle data from various sources or APIs.
● Automated data loading into Snowflake from AWS S3 by configuring Snow pipe auto-ingestion integration,
eliminating manual intervention, and streamlining the process.
● Integrated Snowflake with data orchestration tools like Apache Airflow or DBT to automate data workflows,
reducing manual intervention and improving operational efficiency.
● Executed Hadoop jobs on EMR clusters performing Spark, Hive, and MapReduce Jobs for tasks including
building recommendation engines, transactional fraud analytics, and behavioral insights.
● Migrated Hive and MapReduce jobs to EMR to automate workflows using Airflow, streamlining processes, and
improving efficiency.
● Utilized PySpark and Scala on AWS Databricks for data transformation and enhancement tasks, customizing
transformations and ensuring efficient resource utilization.
● Leveraged Kafka Controller API for efficient resource utilization of Kafka brokers in changing workload.

● Combined Kafka with Apache NiFi for data ingestion into Hadoop clusters, utilizing NiFi's capabilities for data
routing and transformation to efficiently handle and transmit data streams to Kafka topics.
● Managed file movements between HDFS, AWS S3, utilizing S3 buckets in AWS for data storage and retrieval.

● Automated data loading into the Hadoop Distributed File System using Oozie, enabling speedy reviews and first
mover advantages, while leveraging PIG for data preprocessing.
● Employed Python libraries like Pandas and NumPy within PySpark workflows for data manipulation and
statistical analysis, resulting in improved data quality and generation of insights.
● Integrated AWS Glue with other AWS services such as Amazon Athena, Amazon Redshift, and Amazon EMR
to develop end-to-end data processing and analytics solutions, enabling timely insights and decision-making.
● Designed and optimized PySpark jobs for data ingestion, cleansing, transformation, and loading (ETL)
operations, ensuring high performance and scalability across large-scale distributed data processing
environments.
● Implemented Spark using Python and Scala along with Spark SQL for faster testing and processing of data,
improving overall efficiency and scalability of data processing tasks.
● Created and implemented feature engineering pipelines on AWS Sage Maker and AWS Databricks,
preprocessing raw data, extracting pertinent features, and transforming data to facilitate machine learning model
training.
● Played a role in establishing and documenting feature ops guidelines and best practices for AWS Sage Maker
and AWS Databricks, ensuring uniformity and effectiveness in feature engineering across various teams and
projects.
● Used Git for version control and Jira for project management, tracking issues and bugs.

Environment: AWS, EC2, S3, Athena, Lambda, Glue, Elasticsearch, RDS, DynamoDB, Redshift, ECS, Hadoop 2.
x, Hive v2.3.1, Spark v2.1.3, Databricks, Python, PySpark, Java, Scala, SQL, Sqoop v1.4.6, Kafka, Airflow v1.9.0,
HBase, Oracle, Cassandra, MLlib, Quick sight, Tableau, Maven, Git, Jira, Azure DevOps.

United Airlines, IL Jun 2019 to Mar 2021


Snowflake Developer

Key Responsibilities:

● Analyzed data quality issues with SNOW SQL, constructing an analytical warehouse on Snowflake for analysis.

● Created and implemented data processing functions and procedures using Snow park within Snowflake,
enabling sophisticated data transformation and manipulation operations using Scala and Java.
● Used Snow park to execute Spark code directly within Snowflake, facilitating the seamless integration of Spark
functionalities with Snowflake data warehouses to enhance data processing and analytical capabilities.
● Proficiently utilized Snowflake's SnowSQL and Azure Blob Storage SDKs for seamless data integration and
orchestration tasks, facilitating smooth data movement and transformation.
● Actively engaged in troubleshooting and resolving integration challenges between Azure Blob Storage and
Snowflake, ensuring uninterrupted data exchange and processing.
● Created Snowflake procedures to facilitate branching and looping during execution.

● Employed Azure Key Vault to securely manage Snowflake credentials and access keys, ensuring robust data
security and adherence to regulatory standards in Azure-Snowflake integrations.
● Orchestrated the transfer of data between Snowflake and Azure Cosmos DB using Azure Data Factory, enabling
bidirectional data synchronization, and facilitating real-time data analysis across both platforms.
● Integrated Snowflake with Azure Databricks to conduct data processing & analytics, leveraging Azure
Databricks' scalable computing capabilities and collaborative analytics environment for various data
engineering tasks.
● Utilized Azure Event Hubs to stream data in real-time into Snowflake, enabling continuous ingestion of
streaming data for immediate analytics and informed decision-making within Snowflake data warehouses.
● Linked Snowflake with Azure Kubernetes Service (AKS) to deploy and manage Snowflake workloads in
containers, ensuring scalable and efficient execution of data processing tasks in Azure environments.
● Implemented Azure Active Directory (Azure AD) integration with Snowflake to centralize identity and access
management, facilitating seamless authentication and authorization for users accessing Snowflake data
warehouses from Azure environments.
● Utilized Snow park to develop custom user-defined functions (UDFs) in Snowflake, expanding the capabilities
of Snowflake data warehouses to address specific business needs and analytical requirements.
● Engineered and deployed scalable data processing solutions using Snow park within Snowflake, enhancing
performance and efficiency to handle large datasets with optimal resource utilization.
● Developed and fine-tuned Snow SQL queries for intricate data retrieval and manipulation tasks in Snowflake,
ensuring the effective execution and leveraging of Snowflake's data processing capabilities.
● Designed and implemented robust data pipelines using Azure Data Factory to efficiently transfer data between
Snowflake and Azure Data Lake Storage, ensuring optimized performance and reliability.
● Proficiently utilized Snowflake's SnowSQL and Snow park functionalities for data integration and orchestration
tasks, enabling seamless data movement and transformation within Snowflake data warehouses.
● Actively engaged in troubleshooting and resolving challenges related to Snowflake SQL queries and Snow park
Spark code execution, ensuring smooth operation of data processing workflows.
● Fine-tuned Snow pipe configurations, adjusting parameters such as batch size, concurrency, and notification
polling intervals for optimal throughput and latency in data ingestion.
● Orchestrated data transformations using Snowflake's SnowSQL, Python libraries for data quality, consistency.

● Implemented Snowflake's Snow park to execute Spark code directly within Snowflake, enabling advanced data
processing and transformation tasks using languages such as Scala and Java.
● Actively resolved challenges related to Snowflake SQL queries and Snow park Spark code execution for
smooth data processing workflows.
● Created Data Quality Scripts using SQL and Hive to validate successful das ta load and quality of the data.
Created various types of data visualizations using Python and Tableau.

Environment: Hadoop, Azure, Map Reduce, Spark, Spark MLLib, Java, Tableau, Azure DevOps SQL, Excel,
VBA, SAS, Matlab, SPSS, Cassandra, Oracle, MongoDB, SQL, DB2, T-SQL, PL/SQL, XML, Tableau.

LPL Financial, San Diego, CA Dec 2017 – May 2019


Sr Data Engineer/ Hadoop Developer

Key Responsibilities:

● Engaged in collaborative efforts with various Hadoop ecosystem components such as HBase, Pig, Sqoop, and
Oozie, contributing to diverse data processing and workflow orchestration tasks.
● Implemented robust data ingestion pipelines leveraging Apache Flume and Apache Kafka, facilitating seamless
streaming data ingestion into Hadoop for real-time processing needs.
● Played a pivotal role in the design and fine-tuning of HiveQL queries and data warehouse schemas to support
ad-hoc querying and data analysis functionalities within the Hadoop environment.
● Improved the efficiency of PySpark jobs by optimizing Spark configurations and employing partitioning
strategies, leading to reduction in both resource consumption and execution time.
● Utilized hands-on experience to write and optimize MapReduce jobs using Java, enabling the processing and
analysis of both structured and unstructured data stored within the Hadoop Distributed File System (HDFS).
● Led the design of fault-tolerant Hadoop cluster architectures, ensuring high availability and resilience in data
storage and processing operations.
● Collaborated closely with system administrators to monitor and maintain Hadoop clusters, addressing issues
related to HDFS storage capacity, performance, and data integrity.
● Actively contributed to capacity planning and scalability initiatives for Hadoop clusters, forecasting storage and
compute requirements to accommodate expanding data volumes and processing workloads.
● Developed and fine-tuned Scala-based MapReduce algorithms to execute intricate data transformations and
aggregations, optimizing computational efficiency and reducing processing durations.
● Leveraged AWS Lambda for executing serverless data processing and event-triggered migration tasks, enabling
seamless integration with other AWS services and reducing operational complexities during migration.
● Utilized Amazon EMR (Elastic MapReduce) for the processing and analysis of extensive datasets during
migration, leveraging its scalable and cost-efficient capabilities for data transformation needs.
● Used Amazon Athena for on-demand querying and analysis of data stored in AWS S3, allowing interactive and
economical exploration of migrated data for informed decision-making.
● Utilized AWS CloudFormation to automate the deployment and administration of AWS infrastructure resources
necessary for data migration, ensuring standardized and reproducible infrastructure setup.
● Implemented AWS Batch for the batch processing of data during migration, facilitating efficient and scalable
execution of data processing tasks within AWS environments.
● Performed performance evaluations of PySpark jobs using Spark UI and profiling tools, pinpointing
bottlenecks, and implementing enhancements to optimize job efficiency and resource utilization.
● Integrated PySpark with machine learning libraries such as scikit-learn and TensorFlow to execute sophisticated
data transformations and feature engineering tasks for predictive modeling purposes.
● Developed reusable objects like PL/SQL program units and libraries, database procedures, and functions,
database triggers to be used by the team, and satisfying the business rules.
● Worked on bug tracking reports daily using Quality Center.

● Designed, developed, and tested data mart prototype (SQL 2005), ETL process (SSIS), and OLAP cube (SSAS)

Environment: Hadoop, Kafka, Spark, Sqoop, Docker, Swamp, Big Query, Spark SQL, TDD, Spark-Streaming,
Hive, Scala, Pig, NoSQL, Impala, Oozie, Azure DevOps, Hbase, Data Lake, Zookeeper.

Loblaw, Canada Aug 2016 – Oct 2017


Data Engineer

Key Responsibilities:

● Constructed scalable distributed data solutions in Hadoop Cluster environments using Hortonworks distribution.

● Enhanced data processing time and network data transfer efficiency by converting raw data into sequence data
formats like Avro and Parquet.
● Applied normalization and de-normalization to optimize performance in relational, dimensional database
settings.
● Developed, tested, and refined Extract Transform Load (ETL) applications handling various data sources.

● Optimized SQL queries in Hive and crafted files utilizing HUE for improved efficiency.

● Utilized Spark to refine performance and enhance the efficiency of existing algorithms in Hadoop using Spark
context, Spark-SQL, Data Frame, and pair RDDs.
● Created custom PySpark functions to manage data cleansing activities like filling missing values, identifying
outliers, and removing duplicate data, resulting in enhanced data quality and uniformity.
● Employed PySpark for data analysis, leveraging Spark libraries with Python scripting.

● Optimized data processing workflows to ensure seamless handling and processing of large volumes of
unstructured data stored in Amazon S3.
● Developed and deployed scalable PySpark ETL workflows utilizing Apache Airflow, streamlining data
pipelines, and automating processes for data ingestion, transformation, and loading.
● Incorporated data validation checks within PySpark transformations to ensure compliance with business rules
and regulatory standards, minimizing data inconsistencies and bolstering data integrity.
● Utilized AWS ecosystem, employing AWS S3 as central repository for storing and processing unstructured
data.
● Transformed HiveQL into Spark transformations using Spark RDD via Scala programming.

● Transformed unstructured data into structured formats suitable for analysis and storage in databases, employing
AWS Glue for efficient ETL processes.
● Created and utilized User Defined Functions (UDF) and User Defined Aggregated (UDA) Functions in Pig,
Hive.
● Designed and implemented tailored ETL workflows in Spark/Hive to execute data cleaning and mapping tasks.
● Developed Kafka custom encoders to facilitate custom input formats for loading data into Kafka partitions.

● Aided with Kafka cluster topics management through Kafka manager and automated resource management
using Cloud Formation scripting.

Environment: .NET MVC, MS-Excel, Data Quality, MS-Access, SQL, MS Excel, Data Maintenance, PL/SQL,
SQL Plus, Metadata, Tableau, Data Analysis, Tableau, SSIS, SSRS, SSAS.

BluJay Solutions, India May 2014 – Jul 2016


Python Developer

Key Responsibilities:

● Utilized Python libraries like Pandas, NumPy, SciPy to enhance efficiency in managing and analyzing data
tasks.
● Proficiently crafted and executed ETL (Extract, Transform, Load) pipelines in Python to automate data
workflows and uphold data integrity.
● Expertise in seamlessly integrating Python scripts with diverse databases such as PostgreSQL, MySQL, and
MongoDB, facilitating data retrieval, storage, and manipulation.
● Demonstrated a robust grasp of data warehousing principles, coupled with hands-on experience in crafting data
models to optimize the storage and retrieval of structured and unstructured data.
● Developed RESTful APIs leveraging Python frameworks like Flask and Django, enabling smooth
communication between data sources and applications.
● Utilized data visualization tools like Matplotlib, Seaborn to generate visualizations from datasets.

● Extensively worked in automating shell scripting duties using Bash scripting to streamline various tasks
including system administration, data processing, and task automation.
● Experienced in writing automated shell scripts for tasks such as file management, data parsing, and system
monitoring to improve operational efficiency.
● Used GIT as version control for development and managing code repositories for both Python and shell
scripting.
● Generated graphical reports using python packages NumPy and matplotlib.

● Worked on Python extensively to manage unstructured data files like CSV, JSON, XML, log files, emphasizing
effective parsing, extraction, and data transformation.
● Employed personalized Python scripts and parsers to extract pertinent data from unstructured sources, readying
them for inclusion in databases.
● Utilized Python libraries such as Beautiful Soup and xml for the extraction and organization of structured data
from HTML pages and various web sources.
● Crafted and executed data transformation procedures in Python to convert unstructured data into a format
compatible with relational databases, ensuring seamless insertion.
Environment: Python, Oracle DB, Apache server, Pandas, Django, MySQL, Linux, JavaScript, Teradata, SQL
server

EDUCATION:

● BE in Computer Science

You might also like