`
SCHOOL OF COMPUTING SCIENCE AND ENGINEERING
GREATER NOIDA, UTTAR PRADESH
2024 – 2025
INDUSTRY INTERNSHIP
SUMMARY REPORT
AWS Data Engineering Virtual Internship Report
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
Submitted by
Shamee K Sharma (22SCSE1012596)
Vth Sem III Year
1
`
CERTIFICATE
I hereby certify that the work which is being presented in the Internship project
report entitled “ AWS Data Engineering Virtual Internship Report “in
partial fulfillment for the requirements for the award of the degree of Bachelor of
Technology in the School of Computing Science and Engineering of Galgotias University ,
Greater Noida, is an authentic record of my own work carried out in the industry.
To the best of my knowledge, the matter embodied in the project report has not been
submitted to any other University/Institute for the award of any Degree.
Shamee K Sharma (22SCSE1012596)
This is to certify that the above statement made by the candidate is correct and
true to the best of my knowledge.
Signature of Internship Reviewer Signature of Dean (SCSE)
2
`
TABLE OF CONTENTS
CHAPTER TITLE Page No.
Abstract 4
List of Figures & List of Tables
List of Abbreviations
1 Introduction 8
1.1 Objective of the Internship Project
1.2 Problem statement and research objectives of this Internship
1.3 Description of Internship Domain and brief introduction about
an internship organization
2 Internship Activities 9
2.1 Detailed description of tasks and responsibilities.
2.2 Daily/Weekly progress (students can provide a log or journal
of activities).
2.3 Skills or tools used (e.g., programming languages,
frameworks, software, etc.).
3 Learning Outcomes
3.1 Skills acquired (technical and soft skills).
3.2 Knowledge gained about the industry/domain.
3.3 Problem-solving or challenges faced during the internship and
how they were addressed.
4 Project/Work Deliverables
4.1 Details of the main project(s) or tasks completed.
4.2 Outcomes or results of the work done.
4.3 Links or attachments to work products (if applicable, e.g.,
reports, presentations, or code).
5 Conclusion
5.1 Reflections on the overall internship experience.
5.2 Internship certificate.
3
`
ABSTRACT
This report details the experiences and outcomes of a two-month virtual internship focused
on data engineering using Amazon Web Services (AWS). The internship encompassed the
design and implementation of data pipelines, data modeling, and the utilization of various
AWS services to manage and process large datasets. Key deliverables included the
development of scalable data solutions and the application of best practices in data
engineering.
The primary goal of the internship was to design, implement, and optimize data pipelines
capable of handling large and complex datasets. This included tasks such as data ingestion,
transformation, and storage, which are essential for enabling data-driven decision-making
in modern organizations. Leveraging AWS services such as S3 for storage, Redshift for data
warehousing, Glue for ETL processes, and Lambda for automation, the internship
emphasized building scalable and efficient data solutions.
A key aspect of the program was understanding and applying data modelling techniques to
ensure data integrity and efficiency. Participants were introduced to industry-standard
practices, including schema design, data partitioning, and query optimization. These
practices were implemented to address real-world challenges such as performance
bottlenecks and data security concerns.
The internship also highlighted the importance of adopting best practices in data
engineering, such as using IAM roles for secure access, employing serverless computing for
cost-effectiveness, and optimizing Spark jobs for large-scale data processing. The
deliverables included functional data pipelines and documentation that showcased a deep
understanding of the AWS ecosystem and its applications in solving business challenges.
By the end of the internship, participants had gained not only technical proficiency in AWS
tools but also valuable insights into the broader domain of data engineering. This experience
equipped them with the skills to build reliable, scalable, and efficient data systems, making
significant contributions to the field of cloud-based data management. The report
summarizes this transformative journey, emphasizing the practical applications of AWS
technologies and the critical lessons learned during the program.
4
`
LIST OF FIGURES
S. NO FIG. NO TITLE PAGE. NO
1 1 Tools and Technologies Used 6
2 2 Daily/Weekly Progress Summary 8
3 3 Skills Acquired During the Internship 10
4 4 Project Deliverables Overview 12
5
`
LIST OF ABBREVIATIONS
AWS Abbreviation Definition
EMR Amazon Web Services
RDS Elastic MapReduce
S3 Relational Database Service
SQL Simple Storage Service
NoSQL Structured Query Language
ETL Non-Structured Query Language
BI Extract, Transform, Load
Business Intelligence
6
`
CHAPTER 1
INTRODUCTION
CHAPTER 1: INTRODUCTION
1.1 Objective of the Internship Project
The primary objective of this internship was to gain practical experience in data
engineering by designing and implementing data pipelines using AWS services. This
involved understanding data warehousing concepts, data modelling, and the
deployment of scalable data solutions in a cloud environment.
1.2 Problem Statement and Research Objectives
With the increasing volume of data generated by businesses, there is a pressing need
for efficient data processing and analysis tools. The internship aimed to address this
challenge by developing data pipelines capable of handling large datasets, ensuring
data integrity, and enabling data-driven decision-making.
1.3 Description of Internship Domain and Organization
The internship was conducted under the AWS Data Engineering Virtual Internship
program, facilitated by EduSkills Foundation in collaboration with AICTE. The
program focused on cloud-based data engineering, providing exposure to AWS tools
and services essential for building data infrastructure
CHAPTER 2
7
`
INTERNSHIP ACTIVITIES
2.1 Tasks and Responsibilities
Designed and implemented analytical data platform solutions to facilitate data-driven
decisions and insights.
Developed data schemas and managed internal data warehouses and SQL/NoSQL
database systems.
Collaborated with cross-functional teams to extract, transform, and load data from
diverse sources using AWS big data technologies.
Engaged in data model design, architecture discussions, and optimizations to enhance
data processing efficiency.
Explored and utilized AWS services such as S3, Redshift, Lambda, and Glue to build
and maintain data pipelines.
Participated in mentoring sessions conducted by industry experts to gain insights into
real-world data engineering challenges.
2.2 Daily/Weekly Progress
Each week a module was completed in order to produce the desired output on time.
Weekly progress was noted and improved in order to maintain the harmony of the
process.
2.3 Skills or Tools Used
Programming Languages: Python, SQL
AWS Services: S3, Redshift, EMR, RDS, Lambda, Glue
Data Processing Frameworks: Apache Spark, Hive
Data Modelling Tools: ERD tools
Version Control: Git
CHAPTER 3
8
`
LEARNING OUTCOMES
3.1 Skills Acquired
Proficiency in designing and implementing data pipelines using AWS services.
Enhanced understanding of data warehousing concepts and data modelling
techniques.
Improved programming skills in Python and SQL for data processing tasks.
Experience with big data technologies and frameworks such as Apache Spark and
Hive.
Development of soft skills including teamwork, communication, and problem-solving.
3.2 Knowledge Gained
In-depth understanding of AWS cloud services and their applications in data
engineering.
In-depth understanding of AWS data warehousing and data modelling .
Complete knowledge of SQL and Python.
Deep understanding of cloud-based data engineering concepts.
Insight into data lifecycle management, including ingestion, transformation, and
storage.
Practical experience in optimizing cloud-based data solutions for scalability.
CHAPTER 4
9
`
PROJECT/WORK DELIVERABLES
4.1 Details of the main project(s) or tasks completed.
Developed an API extraction system to pull data from a website at regular
intervals.
Built a robust system to authenticate, send requests, and parse the API
response into structured formats (e.g., JSON, CSV).
Automated the data extraction process and scheduled periodic API calls to
update the data.
4.2 Outcomes or results of the work done.
Improved data retrieval efficiency, reducing manual effort and increasing the
frequency of data updates
Delivered real-time insights from the extracted data to support decision-
making processes.
Scalable and Reliable Solutions:
The API extraction process was designed for scalability, ensuring that it can
accommodate growth in the data volume and complexity of the website's API
over time.
4.3 Links or attachments to work products (if applicable, e.g., reports, presentations, or
code).
Documentation outlining the architecture, setup process, and data extraction
methodology.
Presentation:
A concise presentation summarizing the project's objectives, implementation
strategy, results, and future scalability potential. This was shared with
stakeholders to demonstrate the value of the automated API extraction
solution.
Repository with API extraction scripts and configuration files
(https://github.com/shamee12312/porject_aicte/tree/main)
CHAPTER 5
10
`
CONCLUSION
5.1 Reflections on the overall internship experience.
The AWS Data Engineering Virtual Internship provided a comprehensive
learning experience in cloud-based data engineering. It not only enhanced
technical proficiency in AWS tools but also fostered problem-solving and
analytical skills. The opportunity to work on real-world challenges has been
instrumental in preparing for a career in data engineering.
Technical Growth
The internship allowed hands-on exposure to various AWS services like S3,
Redshift, Glue, Lambda, and EMR, which are foundational for modern data
engineering workflows. The ability to work with tools like Apache Spark and
Python further enhanced my capacity to manage, process, and analyze large
datasets efficiently. Designing and optimizing ETL processes, a core part of
the program, helped me understand the intricacies of data ingestion,
transformation, and storage.
Industry Insights
Through this internship, I gained valuable insights into the data engineering
domain and the best practices followed in the industry. I learned about the
significance of data-driven decision-making and the role of robust data
pipelines in achieving business objectives. Understanding how large
organizations use cloud platforms to scale and secure their data infrastructure
was an eye-opener.
Overall Reflection
The AWS Data Engineering Virtual Internship was more than just a learning
opportunity—it was an experience that bridged the gap between academic
concepts and industry practices. By tackling real-world problems and
delivering tangible results, I have grown both professionally and personally.
This journey has solidified my interest in data engineering and affirmed my
commitment to contributing to the field.
11