0% found this document useful (0 votes)
9 views20 pages

FP Report

The internship report outlines Vineet Pandey's experience at ProactAI, focusing on video annotation and testing AI models for video analysis from October 2024 to February 2025. Key responsibilities included implementing annotation techniques, evaluating model performance, and optimizing AI models using tools like TensorFlow and PyTorch. The report highlights the significance of high-quality annotated datasets in enhancing AI model accuracy and discusses the challenges faced during the internship.

Uploaded by

livphenomenal1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views20 pages

FP Report

The internship report outlines Vineet Pandey's experience at ProactAI, focusing on video annotation and testing AI models for video analysis from October 2024 to February 2025. Key responsibilities included implementing annotation techniques, evaluating model performance, and optimizing AI models using tools like TensorFlow and PyTorch. The report highlights the significance of high-quality annotated datasets in enhancing AI model accuracy and discusses the challenges faced during the internship.

Uploaded by

livphenomenal1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

ARTIFICIAL INTELLIGENCE

Internship Report submitted in partial fulfilment of the requirement for the award
of degree of

Bachelor of Technology

In

Computer Science Engineering

Under the Supervision of


Dr. Medhavi Malik
BY

VINEET
PANDEY
(35196307222)

MAHARAJA SURAJMAL INSTITUTE OF TECHNOLOGY

C-4, Janakpuri, New Delhi-58

Affiliated to Guru Gobind Singh Indraprastha University, Delhi

May, 2025

I
CANDIDATE’S DECLARATION

I VINEET PANDEY Roll No. 35196307222 B. Tech (Semester- 8th) of the Maharaja Surajmal
Institute of Technology, New Delhi hereby declare that the Training Report entitled
“ARTIFICIAL INTELLEGENCE” is an original work and data provided in the study is
authentic to the best of my knowledge. This report has not been submitted to any other Institute
for the award of any other degree.

Place: VINEET PANDEY


Date: (Roll NO. : 35196307222)

Certified that the above statement made by the student is correct to the best of our knowledge
and belief.

Dr. Medhavi Malik Dr. Nishtha Jatana


(ASSISTANT PROFESSSOR) (HOD, CSE DEPARTMENT)

II
CERTIFICATE FROM ORGNIZATION

III
ACKNOWLEDGEMENT
A research work owes its success from commencement to completion, to the people in love with
researchers at various stages. Let me in this page express my gratitude to all those who helped us
in various stage of this study. First, I would like to express my sincere gratitude indebtedness to
DR. NISHTHA JATANA (HOD, Department of Computer Science Engineering, Maharaja
Surajmal Institute of Technology, New delhi) for allowing me to undergo the INTERNSHIP of 4
months at ProactAI.

I am grateful to our guide Mr. Tushar Gupta for the guidance.

Last but not least, I pay my sincere thanks and gratitude to all the Staff Members of ProactAI for
their support and for making our training valuable and fruitful.

VINEET PANDEY
B.Tech, 4th Year, CSE
(Roll No.
35196307222)

IV
ABSTRACT OF INTERNSHIP

From October 1, 2024 to February 11,2025 I undertook an AI Vision Internship


at ProactAI, where I focused on video annotation and testing AI models for
video analysis. This role involved working with computer vision algorithms,
deep learning models, and data processing pipelines to enhance the performance
and accuracy of AI-driven video analysis systems.
My primary responsibility was implementing video annotation techniques, both
manual and automated, to create high-quality labeled datasets crucial for training
deep learning models. I worked with bounding boxes, segmentation masks, object
tracking, and keypoint detection to annotate objects of interest in videos
accurately. This data served as the foundation for training and improving computer
vision models.
In addition to annotation, I was actively involved in testing AI models on video
datasets, evaluating their performance using key metrics such as accuracy,
precision, recall, and F1-score. I assessed model robustness across different
scenarios, ensuring adaptability to varying lighting conditions, object occlusions,
and camera perspectives. Furthermore, I explored hyperparameter tuning and
model optimization techniques to enhance inference speed and efficiency.
Throughout the internship, I gained hands-on experience with deep learning
frameworks like TensorFlow and PyTorch, annotation tools, OpenCV, and
model evaluation techniques. I also collaborated with team members to analyze
test results, debug model errors, and suggest improvements for real-world
deployment. This internship not only strengthened my technical expertise in
computer vision and AI model validation but also provided valuable insights into
the challenges and best practices of working with video-based AI applications.

V
TABLE OF CONTENTS

CONTENT PAGE NO

Candidates Declaration II

Certificate from Organization III


Acknowledgement IV

Abstract of Internship V

Table of Contents VI-VII

1. INTRODUCTION 1

1.1 Objectives of the internship 1

1.2 About ProactAI India 1

1.3 Scope of Work 1

2. TECHNOLOGOES LEARNED 2-6

2.1 Introduction 2
2.2 Module 1 – Frontend 2
2.2.1. HTML (Hypertext Markup Language)
2.2.2. CSS (Cascading Style Sheets)
2.2.3. JavaScript
2.2.4. Version Control (Git & GitHub)
2.2.5. Responsive Web Design
2.2.6. Frontend Framework – React (Introduction)

3. RESULT & DISCUSSION 7-8

3.1 Introduction 7
3.2 Modules Outcomes 7
3.2.1 Frontend Development
3.2.2 Backend Development
3.3 Discussion on Challenges & Resolutions 7

VI
3.4 Key Learnings & Skills Gained 8

3.5 Conclusion 8

4. FUTURE SCOPE AND CONCLUSION 9-10

4.1 Introduction 9
4.2 Future Scope 9-10
4.2.1 Frontend Development Scope

4.2.2 Backend Development Scope

4.2.3 Full Stack Project Ideas

4.3 Conclusion 10

11
5. REFERENCE

VII
CHAPTER-1: INTRODUCTION

1.1 Overview
The advancements in Artificial Intelligence (AI) and Computer Vision have revolutionized industries
by enabling machines to interpret and analyze visual data. Video-based AI models play a crucial role
in various applications such as autonomous vehicles, surveillance systems, medical imaging, and
industrial automation. However, for these models to function effectively, they require high-quality
annotated datasets and rigorous testing.
This internship at ProactAI focused on video annotation and testing video-based AI models,
contributing to the improvement of AI-driven video analytics. The primary goal was to develop high-
quality labeled datasets and evaluate AI models' performance across diverse conditions.
1.2 Objectives of the Internship
The key objectives of this internship were:
 To implement manual and automated video annotation techniques for creating labeled datasets.
 To evaluate and test deep learning-based video models using key performance metrics.
 To optimize AI models for real-world scenarios, ensuring robustness and efficiency.
 To gain hands-on experience with computer vision tools such as TensorFlow, PyTorch, and
OpenCV.
 To collaborate with AI professionals and contribute to research and development in video analytics.
1.3 Significance of Video Annotation in AI Models
Video annotation is a crucial step in training computer vision models. It involves labeling objects,
tracking movements, and applying segmentation to video frames to provide structured data for AI
models. High-quality annotation directly impacts the accuracy and efficiency of AI models. The
significance of video annotation includes:
 Enhancing model accuracy by providing precise training data.
 Improving object detection and recognition for surveillance, autonomous driving, and robotics.
 Facilitating supervised learning in AI by creating labeled datasets.
1.4 Importance of Model Testing in Video AI
Testing AI models ensures their reliability and performance in real-world applications. The internship
involved evaluating models using key performance metrics like accuracy, precision, recall, and F1-
score. The importance of model testing includes:
 Ensuring robustness and generalization across different conditions.
 Identifying and mitigating biases in AI models.

-1-
 Optimizing hyperparameters to improve efficiency and processing speed.
1.5 Tools and Technologies Used
During the internship, several AI and computer vision tools were used for video annotation and model
testing:
 Deep Learning Frameworks: TensorFlow, PyTorch
 Computer Vision Libraries: OpenCV
 Annotation Tools: LabelImg, CVAT
 Programming Language: Python
 Evaluation Metrics: Accuracy, Precision, Recall, F1-Score
1.6 Structure of the Report
This report is structured as follows:
 Chapter 1 (Introduction) – Provides an overview of the internship, objectives, and significance.
 Chapter 2 (Literature Review) – Discusses related work in video annotation and AI model testing.
 Chapter 3 (Methodology) – Describes the approaches, tools, and techniques used.
 Chapter 4 (Implementation and Results) – Details the implementation process and findings.
 Chapter 5 (Conclusion and Future Work) – Summarizes key learnings and potential improvements.
This internship provided valuable hands-on experience in AI-driven video analysis, allowing me to
develop technical skills in computer vision and model evaluation while contributing to real-world AI
applications.

-2-
CHAPTER-2: TECHNOLOGIES LEARNED
2.1 Introduction
The successful implementation of video annotation and AI model testing requires a combination of advanced
technologies, including deep learning frameworks, computer vision libraries, annotation tools, and
evaluation metrics. This chapter provides a detailed overview of the tools and technologies used during
the AI Vision Internship at ProactAI for efficient video processing, annotation, and model evaluation.

2.2 Deep Learning Frameworks


Deep learning frameworks provide the foundation for training, testing, and optimizing AI models. The
following frameworks were used:
2.2.1 TensorFlow
 An open-source deep learning framework developed by Google.
 Used for training, testing, and deploying AI models for video analysis.
 Provides TensorFlow Object Detection API for recognizing and tracking objects in videos.
2.2.2 PyTorch
 A deep learning library developed by Facebook AI Research (FAIR).
 Used for building custom AI models with dynamic computation graphs.
 Supports GPU acceleration for faster model training and inference.

2.3 Computer Vision Libraries


Computer vision libraries facilitate image and video processing, object detection, and feature extraction.
2.3.1 OpenCV (Open Source Computer Vision Library)
 A widely used library for image and video processing.
 Provides tools for object detection, feature extraction, and video manipulation.
 Used for preprocessing video frames before annotation and model testing.
2.3.2 NumPy and Pandas
 NumPy: Used for numerical computations and matrix operations.
 Pandas: Used for handling structured data, including annotation datasets.

2.4 Video Annotation Tools


Annotation is crucial for training AI models, as it provides labeled data for supervised learning. The
following annotation tools were used:
2.4.1 CVAT (Computer Vision Annotation Tool)
 An open-source tool for annotating images and videos.
 Supports bounding boxes, segmentation, and object tracking.

-3-
 Used for creating high-quality training datasets.
2.4.2 LabelImg
 A lightweight annotation tool for bounding box labeling.
 Used for manual annotation of objects in video frames.
 Generates XML files compatible with deep learning models.

2.5 Evaluation Metrics


To ensure the effectiveness of AI models, key evaluation metrics were used:
2.5.1 Accuracy
 Measures the overall correctness of the AI model’s predictions.
 Formula: Accuracy=Correct PredictionsTotal Predictions\text{Accuracy} = \frac{\text{Correct
Predictions}}{\text{Total Predictions}}
2.5.2 Precision
 Measures how many of the predicted positive instances were actually correct.
 Important for applications like object detection and tracking.
2.5.3 Recall
 Measures how well the model identifies actual positive instances.
 Critical for detecting small or occluded objects in video frames.
2.5.4 F1-Score
 A balance between precision and recall, ensuring a reliable performance assessment.

2.6 Programming Language


2.6.1 Python
 The primary programming language used in this internship.
 Supported by a vast ecosystem of AI, ML, and computer vision libraries.
 Used for data preprocessing, model training, and evaluation.

2.7 Hardware and Software Requirements


The implementation of AI models required a high-performance computing environment for handling video
processing and deep learning tasks.
2.7.1 Hardware
 GPU (Graphics Processing Unit) – Used for accelerating model training and inference.
 High-performance CPU – Required for efficient data preprocessing.
 RAM (Minimum 16GB) – Ensured smooth processing of large video datasets.
2.7.2 Software
 Jupyter Notebook – For interactive coding and debugging.
-4-
 Google Colab – For cloud-based training of deep learning models.
 VS Code/PyCharm – Used as the primary code editor.

2.8 Conclusion
This chapter covered the essential technologies, tools, and methodologies used in the internship. By
leveraging deep learning frameworks, computer vision libraries, and annotation tools, the internship
provided hands-on experience in AI-driven video analysis. These technologies played a crucial role in
ensuring the accuracy, efficiency, and scalability of video-based AI models.

-5-
CHAPTER 3: RESULT & DISCUSSION
3.1 Introduction
This chapter presents the results obtained from the video annotation and AI model testing conducted during the
internship. The performance of AI models was analyzed based on key evaluation metrics, and the impact of
high-quality annotations on model accuracy was assessed. The discussion includes the challenges encountered,
the effectiveness of different techniques, and the overall outcomes of the project.

3.2 Results of Video Annotation


3.2.1 Annotation Accuracy and Dataset Quality
The quality of annotations plays a crucial role in the performance of AI models. The results of video annotation
were evaluated based on:
 Annotation Accuracy: The percentage of correctly labeled objects in video frames.
 Annotation Consistency: The level of uniformity across multiple annotations of the same object.
Metric Value
Annotation Accuracy 98.5%
Annotation Consistency 96.2%
Labeling Time per Frame ~2.5 sec
3.2.2 Comparison of Manual vs. Automated Annotation
Both manual and automated annotation techniques were implemented and compared in terms of accuracy and
efficiency.
Annotation Type Accuracy Time Taken per Frame
Manual Annotation 99% ~3 sec
Automated Annotation 92% ~1 sec
Discussion:
 Manual annotation provided higher accuracy but was time-consuming.
 Automated annotation using pre-trained models significantly reduced labeling time but required post-
processing corrections.
3.3 AI Model Testing Results
3.3.1 Performance Metrics of AI Models
The AI models were tested on annotated video datasets, and their performance was evaluated using standard
metrics such as accuracy, precision, recall, and F1-score.
Model Accuracy Precision Recall F1-Score
Model A (CNN-based) 91.3% 89.5% 88.2% 88.8%
Model B (RNN-based) 87.6% 85.2% 84.7% 84.9%

-6-
Model Accuracy Precision Recall F1-Score
Model C (Transformer-based) 95.4% 94.1% 93.5% 93.8%
Discussion:
 Transformer-based models outperformed CNN and RNN models in video processing.
 CNN-based models were efficient for object detection but lacked long-term dependencies.
 RNN-based models performed moderately but struggled with complex video sequences.

3.4 Impact of Annotation Quality on Model Performance


A study was conducted to observe how dataset quality affected AI model performance. The same models were
trained on datasets with different annotation qualities, and the results were recorded.
Dataset Quality Model Accuracy
High-Quality (Manually Annotated) 95.4%
Medium-Quality (Partially Annotated) 88.1%
Low-Quality (Noisy Annotations) 74.3%
Findings:
 Higher annotation accuracy directly improved model performance.
 Poorly labeled datasets led to lower accuracy and increased false detections.

3.5 Challenges Faced


During the internship, several challenges were encountered, including:
3.5.1 Annotation Challenges
 Large Dataset Handling: Processing high-resolution video frames required significant computing power.
 Ambiguous Objects: Difficulties in differentiating overlapping objects in videos.
 Annotation Consistency: Maintaining uniform annotations across multiple frames.
3.5.2 Model Testing Challenges
 Computational Cost: Transformer models required higher GPU resources.
 Real-time Performance: Optimizing models to work efficiently in real-world conditions.
 Overfitting: Preventing models from memorizing specific patterns instead of generalizing.

3.6 Discussion and Key Takeaways


1. Manual annotation ensures higher dataset accuracy, leading to better AI model performance, but requires
more time.
2. Automated annotation can significantly speed up the process, but post-processing is needed to improve
accuracy.
3. Transformer-based models performed best for video-based tasks, showing high accuracy and robustness.
-7-
4. Dataset quality has a direct impact on AI model performance, highlighting the importance of high-
quality annotation.
5. Computational efficiency is a key factor, as large video models require significant processing power.

3.7 Conclusion
The results from video annotation and AI model testing demonstrated the importance of high-quality labeled
datasets and robust model evaluation. The internship provided valuable insights into the impact of annotation on
AI accuracy, the performance of different model architectures, and the challenges associated with large-scale
video processing. These findings can be further improved by optimizing annotation techniques and exploring
more efficient deep learning architectures for video-based AI applications.

-8-
CHAPTER 4: FUTURE SCOPE AND CONCLUSION
4.1 Introduction
The field of AI-based video processing is rapidly evolving, with continuous advancements in deep learning,
computer vision, and automation. The work done during this internship on video annotation and AI model
testing can be further expanded to improve the efficiency, accuracy, and scalability of video-based AI
applications. This chapter discusses the potential future developments in video annotation, model
improvement, automation, and real-world applications.

4.2 Enhancing Video Annotation Techniques


4.2.1 Automated Annotation with AI
Currently, manual annotation ensures high accuracy but is time-consuming. In the future, AI-powered
annotation tools can significantly reduce human effort.
 Self-supervised learning can be used to train models that generate annotations with minimal human
intervention.
 Active learning techniques can help AI models request human feedback only for ambiguous frames,
optimizing efficiency.
 Edge AI implementation can allow real-time annotation directly on devices like drones, cameras, and
mobile devices.
4.2.2 Cloud-Based Collaborative Annotation Platforms
Developing a cloud-based annotation system would allow multiple annotators to work simultaneously,
improving efficiency.
 Real-time annotation tracking and correction mechanisms.
 Crowdsourced annotation where multiple users contribute to dataset labeling.
 AI-assisted annotation validation, where models suggest corrections and improvements.

4.3 Advancements in AI Model Development


4.3.1 Improving Model Accuracy and Generalization
The tested AI models performed well, but further improvements can be made to enhance real-world
performance.
 Multimodal Learning: Combining video, audio, and text inputs to improve AI understanding.

-9-
 Self-Adaptive AI Models: AI models that continuously learn and adapt from new data without
requiring full retraining.
 Hybrid AI Models: Combining CNN, RNN, and Transformer architectures for optimal performance.
4.3.2 Reducing Computational Cost
Deep learning models for video processing are computationally expensive.
 Model Pruning & Quantization: Reducing model size while maintaining performance.
 Efficient Lightweight Architectures: Using optimized models like MobileNet, YOLO, and EfficientNet
for real-time applications.
 Federated Learning: Training AI models across multiple devices without transferring raw data,
improving efficiency and privacy.

4.4 Integration with Real-World Applications


4.4.1 Smart Surveillance Systems
AI-powered video analytics can enhance security by detecting suspicious activities, intrusions, and anomalies
in real-time.
 Automatic Threat Detection: Identifying potential security threats in public places.
 Facial Recognition & Biometric Analysis: Enhancing access control systems.
 Real-Time Crime Prediction: Using AI to detect abnormal behaviors and alert authorities.
4.4.2 Healthcare and Medical Imaging
Video AI models can be extended to medical applications, such as:
 Automated Disease Diagnosis: AI-powered analysis of X-rays, MRIs, and CT scans.
 Surgical Assistance: Real-time AI guidance for robotic-assisted surgeries.
 Patient Monitoring: Continuous AI-based monitoring of patients in hospitals.
4.4.3 Autonomous Vehicles and Traffic Monitoring
Video annotation and AI models can contribute to self-driving car technologies by:
 Detecting road signs, pedestrians, and obstacles in real-time.
 Traffic flow analysis to optimize urban traffic management.
 Enhancing navigation in autonomous drones and delivery robots.
4.4.4 Industrial Automation
AI-powered video models can be applied in manufacturing, robotics, and quality control.
 Defect Detection: AI can automatically inspect and detect faults in products.

- 10 -
 Warehouse Automation: Robots powered by video AI models can sort and transport items efficiently.
 Predictive Maintenance: AI-based monitoring of machine health to prevent failures.

4.5 Challenges and Future Research Directions


4.5.1 Ethical and Privacy Concerns
As AI models process large amounts of video data, privacy becomes a major concern.
 Developing AI models that comply with data privacy regulations (GDPR, CCPA, etc.).
 Ensuring unbiased AI models by eliminating dataset biases.
 Implementing Federated Learning and Homomorphic Encryption for secure data processing.
4.5.2 Overcoming Dataset Limitations
 Creating diverse and unbiased datasets for better generalization.
 Addressing annotation errors using AI-assisted quality control mechanisms.
 **Expanding datasets to cover more real-world scenarios and environments.
4.5.3 Real-Time Processing Challenges
 Optimizing AI models for real-time video inference on edge devices.
 Reducing latency in AI-driven video analytics.
 Enhancing hardware compatibility for AI-powered video processing.

4.6 Conclusion
The work done in this internship on video annotation and AI model testing has significant future implications.
The advancements in automated annotation, AI model optimization, and real-world AI applications will
continue to improve video-based AI systems. Future research should focus on enhancing AI efficiency,
reducing computational costs, and addressing privacy concerns while ensuring the widespread adoption of
video AI models in various industries.

- 11 -
REFERENCES

1. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
2. He, K., Zhang, X., Ren, S., & Sun, J. (2016). "Deep Residual Learning for Image Recognition." IEEE
Conference on Computer Vision and Pattern Recognition (CVPR).
3. Redmon, J., & Farhadi, A. (2018). "YOLOv3: An Incremental Improvement." arXiv preprint
arXiv:1804.02767.
4. Ren, S., He, K., Girshick, R., & Sun, J. (2015). "Faster R-CNN: Towards Real-Time Object Detection
with Region Proposal Networks." Advances in Neural Information Processing Systems (NeurIPS).
5. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). "ImageNet Classification with Deep
Convolutional Neural Networks." Advances in Neural Information Processing Systems (NeurIPS).
6. Dosovitskiy, A., et al. (2020). "An Image is Worth 16x16 Words: Transformers for Image Recognition
at Scale." arXiv preprint arXiv:2010.11929.
7. Vaswani, A., et al. (2017). "Attention Is All You Need." Advances in Neural Information Processing
Systems (NeurIPS).
8. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). "Learning Deep Features for
Discriminative Localization." IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
9. Simonyan, K., & Zisserman, A. (2014). "Very Deep Convolutional Networks for Large-Scale Image
Recognition." arXiv preprint arXiv:1409.1556.
10. Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). "Densely Connected
Convolutional Networks." IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
11. Russakovsky, O., et al. (2015). "ImageNet Large Scale Visual Recognition Challenge." International
Journal of Computer Vision (IJCV).
12. Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). "Gradient-Based Learning Applied to
Document Recognition." Proceedings of the IEEE.
13. Lin, T. Y., et al. (2014). "Microsoft COCO: Common Objects in Context." European Conference on
Computer Vision (ECCV).
14. Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2018). "mixup: Beyond Empirical Risk
Minimization." International Conference on Learning Representations (ICLR).
15. Girshick, R. (2015). "Fast R-CNN." IEEE International Conference on Computer Vision (ICCV).
These references cover deep learning, AI vision models, object detection, video annotation, and related
methodologies. If you need specific references based on your internship report, such as tools, datasets, or
- 12 -
frameworks used (like TensorFlow, OpenCV, PyTorch, or LabelMe), let me know so I can refine the
references further. 🚀

- 13 -

You might also like