0% found this document useful (0 votes)
23 views37 pages

Majorreport 2

A report on under water ROV to detect underwater bodies using CNN

Uploaded by

Ashmit Prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views37 pages

Majorreport 2

A report on under water ROV to detect underwater bodies using CNN

Uploaded by

Ashmit Prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

UNDERWATER DRONE WITH IMAGE PROCESSING AND

OBJECT DETECTION USING CNN

Major Project Report


Submitted in partial fulfillment of the requirements for the Award degree
of

Bachelor of Technology
In
Electrical Engineering
Submitted by

Palak Makwana 201113001


Anshu Kumari 201113020
Sanjana Yadav 201113037
Anil Kumar 201113003
Ashmit Prasad 201113048

Under the Guidance of,

Dr. Mukesh Kumar Kirar

Department of Electrical Engineering

MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY


BHOPAL (M.P)
MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY
BHOPAL

DECLARATION BY CANDIDATES
We hereby declare that the work which is being presented in this dissertation entitled
“UNDERWATER DRONE WITH IMAGE PROCESSING AND OBJECT DETECTION”,
submitted in partial fulfillment of the requirement for the award of the degree of Bachelor of
Technology in Electrical Engineering, has been carried out at Maulana Azad National Institute
of Technology, Bhopal and is an authentic record of our own work carried out under the esteemed
guidance of Dr. Mukesh Kumar Kirar. The matter embodied in this dissertation, in part or whole
has not been presented or submitted by us for any purpose in any other institute or organization for
the award of any other degree.
We further declare that the facts mentioned above are true to the best of my knowledge. In case of
any unlikely discrepancy that may possibly occur, we will be the one to take responsibility.

S.No. Scholar No. Name of Student Signature of Student


1 201113001 Palak Makwana

2 201113003 Anil Kumar

3 201113020 Anshu Kumari

4 201113037 Sanjana Yadav

5 201113048 Ashmit Prasad


CERTIFICATE of APPROVAL

This is to certify that the dissertation work entitled “Underwater Drone With Object Detection and
Image Processing” is a bonafide record of the work done by the following students and submitted
in partial fulfillment of the requirements for the award of degree of Bachelor of Technology in
Electrical Engineering To the best of my knowledge and belief, the dissertation embodies the work
of the following candidates. They have duly completed the work and fulfill the requirement of the
ordinance relating to the Bachelor of Technology degree from Department of Electrical
Engineering, Maulana Azad National Institute of Technology, Bhopal, India.

S.No. Scholar No. Name of Student Signature of Student


1 201113001 Palak Makwana

2 201113003 Anil Kumar

3 201113020 Anshu Kumari

4 201113037 Sanjana Yadav

5 201113048 Ashmit Prasad

GUIDE NAME: Dr. Mukesh Kumar Kirar


Date:
ACKNOWLEDGEMENT

We would like to acknowledge the support provided by Dr. Mukesh Kumar Kirar
in guiding and correcting us at all stages of development of this project with utmost
attention and care. We express our thanks to Dr. Sushma Gupta, Head of the
Department Electrical Engineering, for extending her support and providing us
necessary facilities. We would in particular give our regards to our guide, Dr.
Mukesh Kumar Kirar , for his insightful advice, invaluable guidance, help, support
in successful completion of this project and his consistent encouragement, advices
throughout our project work. It was a great learning experience for us and we
sincerely thank you for giving us this opportunity.

Palak Makwana 201113001


Anil Kumar Garg 201113003
Anshu Kumari 201113020
Sanjana Yadav 201113037
Ashmit Prasad 201113048
TABLE OF CONTENT
LIST OF FIGURE
LIST OF ABBREVIATION
ABSTRACT
CHAPTER 1 : INTRODUCTION
1.1 : Background Study
1.2 : Objective
CHAPTER 2 : LITERATURE REVIEW
2.1 : Various Image Processing Techniques
2.2 : Object Detection Theories
2.3 : Research Gaps
2.4 : Preference of CNN over other methods
CHAPTER 3 : METHODOLOGY
3.1 : Block Diagram
3.2 : Hardware Components
3.3 : Software
3.4 : Image Processing
3.5 : Data Collection and Preprocessing
3.6 : CNN Architecture
CHAPTER 4 : RESULTS AND DISCUSSION
4.1 : Results and Testing
4.2 : Applications
CHAPTER 5 : Conclusion and Future Scope
REFERENCES
APPENDICES
ABSTRACT

The exploration of underwater environments presents unique challenges that necessitate


innovative technological solutions. This research paper delves into integration of an
underwater drone system equipped with advanced image processingand object detection
capabilities. Here we are using CNN techniques for image processing. The project
successfully demonstrates the integration of image processing algorithms and object
detection techniques to enhance the capabilities of underwater drones.
CHAPTER 1
INTRODUCTION

1.1 BACKGROUND STUDY


The world beneath the surface of our oceans, seas, lakes, and rivers has long remained one of the
most mysterious and least explored frontiers on Earth. Yet, within these aquatic depths lies a realm of
immense scientific, economic, and environmental significance. Unlocking the secrets of underwater
ecosystems, monitoring the health of our planet's oceans, and supporting a range of industrial
activities have become increasingly vital tasks. In response to these challenges, the development and
deployment of underwater drones, also known as remotely operated vehicles (ROVs) or autonomous
underwater vehicles (AUVs), have emerged as transformative technologies.[1] Underwater drones
represent a diverse family of submersible vehicles designed to navigate and operate in aquatic
environments while remotely controlled by humans or functioning autonomously based on
programmed instructions. Underwater drones, also known as remotely operated vehicles (ROVs) and
autonomous underwater vehicles (AUVs), have gained significant attention due to their potential for
deep-sea exploration and monitoring. They are used for tasks such as underwater archaeology,
marine biology research [1], environmental monitoring, and offshore infrastructure inspection. These
drones enable scientists and researchers to explore the depths of the ocean, conduct surveys in
challenging underwater environments, study marine life, map underwater terrain, and inspect
submerged structures such as pipelines and oil rigs. Additionally, they play a crucial role in search
and rescue missions, helping locate and retrieve objects or individuals in distress underwater. Object
detection and image processing are essential tasks for these vehicles to make sense of the underwater
environment. Underwater drones have gained widespread importance in marine research,
environmental monitoring, and various industrial applications. By equipping these drones with image
processing and object detection, we can extend their functionalities and make them more versatile
tools for underwater exploration and data collection. The integration of image processing and object
detection using CNNs adds a new dimension to the capabilities of underwater drones. Image
processing enables these vehiclesto capture high-quality visuals in challenging underwater conditions,
while object detection allows them to identify and classify various objects or marine life with a high
degree of accuracy. This technological advancement empowers underwater drones to navigate
autonomously, identify objects of interest, and make real-time decisions, thereby expanding their
utility in marine research, environmental monitoring, and infrastructure inspection.
1.2 Objective

• Development of an Integrated System: To design and implement a fully integrated system


that combines underwaterdrones, image processing, and Convolutional Neural Networks
(CNNs) to enhance the capabilities of underwater exploration and data collection.
• Enhanced Image Capture: To improve the quality of underwater image and video capture by
developing image processing techniques that address challenges such as lowlight conditions,
water turbidity, and color distortion.
• Object Detection and Classification: To develop CNN-basedobject detection models capable
of identifying and classifying underwater objects, including marine species, geological
formations, infrastructure, and debris, with a high degree of accuracy.
• Real-Time Decision-Making: To enable the underwater droneto make real-time decisions
based on the detected objects, which could include navigation adjustments, data collection
prioritization, or data transmission to operators or autonomoussystems.
CHAPTER 2
LITERATURE REVIEW

Underwater environments often suffer from reduced visibility due to suspended particles and
turbidity. These conditions hinder traditional image processing and object detection techniques. Cui et
al. (2019) discuss the challenge of underwater image degradation and its impact on object
detection.[2].Water absorbs and scatters light differently from air, leading to a loss of color and
contrast in underwater images. Compensating for these color shifts and achieving accurate color
representation is a significant challenge.
Iqbal et al. (2018) review various techniques to address this challenge.[3] Underwater scenes often
contain complex backgrounds, including coral reefs, seaweed, and rocks.
Distinguishing objects of interest from the cluttered surroundings is challenging and requires
advanced object detection algorithms. This issue is highlighted in Ferrari et al.'s study (2018) on
oceanographic research.[4] Lighting conditions underwater can change rapidly with depth and time of
day. These variations affect the visibility of objects and complicate image processing. Kim et al.
(2015) discuss the importance of handling varying lighting conditions in underwater object
detection.[5]. Building robust object detection models in underwater environments requires access to
large and well- annotated datasets. However, such datasets are often limited.
Efforts are underway to create benchmark datasets for underwaterobject detection, as seen in the work
of Cui et al. (2019).[6] Many underwater applications, such as autonomous navigation and
infrastructure inspection, require real-time object detection and decision-making. Achieving low-
latency processing while maintaining accuracy is a technical challenge. Salvadeo et al. (2017)
discuss this challenge in the context of underwater search and rescue operations [7]. Underwater
drones are often constrained by limited computational resources and power.
Developing efficient algorithms that can run on resource- constrained hardware is essential. This
challenge is discussed in Mennatullah et al.'s research (2019) on underwater object tracking.[8]
Addressing these challenges is essential for improving the capabilities of underwater drones in object
detection and image processing, enabling them to excel in various applications, from scientific
research to industry and search and rescue operations.
2.1 Various Image processing Techniques:
Image processing for underwater drones is a crucial aspect of their operation, as it helps enhance the
quality of images and videos captured in challenging underwater environments. Here are some
various existing techniques used in image processing for underwater drones.
Histogram equalization redistributes pixel values in an image to enhance its contrast and improve
visibility.[9].Contrast stretching expands the range of pixel values to enhance the contrast of
underwater images.[10].Adaptive filters adjust filter parameters based on local image characteristics,
reducing noise and enhancing image details.[11].Gamma correction adjusts the brightness and
contrast of images, often used to compensate for the nonlinear response of underwater cameras.[12]
Retinex-based algorithms aim to separate illumination and reflectance components in an image,
reducing haze and improving color accuracy[13].Dark Channel Prior is used for underwater dehazing
by estimating and removing haze from the images.[14]Color correction algorithms, such as the Lee-
Kuan filter, correct the color distortion introduced by water absorption.[15].Wavelet transform-based
methods decompose the image into different frequency components, allowing for localized
enhancement.[16].Fusion techniques combine images from different sensors or modalities to
improve the overall image quality.[17] Forand et al., [17] designed and built a Laser Underwater
camera Image Enhancer (LUE) system to improve the quality of laser underwater images. According
to the authors, the range of the LUE system is 3-5 times that of a traditional camera equipped with
floodlights.[18].Deep learning models, such as convolutional neural networks (CNNs), can be
trained to enhance underwater images by learning from large datasets. Perez et al.[19] proposed a
deep learning-based underwater image enhancement method, which produced a training dataset
consisting of groups of degraded underwater images and groups of restored underwater images. The
pattern between the damaged underwater images and the recovered underwater images was obtained
from many training sets using the deep learning method used to improve the quality of the
underwater image. Corrigan et al. [20] presented a mosaic method for underwater videos that used
time smoothing before the motion parameters to reduce the noise to a maximum posterior
homography estimation method to obtain a small amount of texture detail. Anwer et al. Gruev et al.
[21] described two approaches to create focal polarization imaging sensors. The first approach
combines polymer polarization filters with an active CMOS pixel sensor and computes polarization
information at the focal plane. Another approach describes the preliminary work of polarization
filters using aluminum nanowires. Measurements from the first prototype polarization image sensor
are discussed in detail and applications to material detection using polarization techniques are
described. The underwater polarization imaging technique is described in detail byLietal.[22
2.2 Object Detection Theories

Underwater detection relies mainly on digital cameras, and image processing is usually used to
improve quality and reduce noise; Contour segmentation methods are commonly used to locate
objects. Several such methods are proposed to implement object detection. For example, Chen Chang
et al. [23] proposed a new image denoising filter based on the standard median filter, which is used
to detect noise and convert the original pixel value to a newer median. Prabhakar et al. [24] proposed
a new noise reduction method to remove additional noise from underwater images, homomorphic
filtering is used to correct uneven light and an anisotropic filter is used for smoothing. A new
approach to denoising is used to enhance underwater images by combining wavelet decomposition
with a high-pass filter (Sun et al., 2011); both low- frequency components of backscatter noise and
uncorrelated high- frequency noise can be effectively suppressed simultaneously.
However, the blurring of the processed image is important based on the wavelet method. Kocak et al.
[25] used an environmental filter to remove noise, the image quality is improved by stretching the
RGB color plane, the atmospheric light is obtained through the dark channel before, and this method
is useful for images with low noise. For noisy images, Zhang et al. use the two-way filtering method.
[26], the results are good, but the processing time is very long.
Markku et al. gives the exact unbiased inverse of the generalized Anscombe transform. [27]; the
comparison shows that the methodplays an important role in ensuring an accurate denoising result.
The Laser Underwater Camera Image Enhancer system was designed and built by Forand et al. [28]
to improve underwater laser image quality, and it was demonstrated that the operating system of the
system is 3-5 times greater than that of a conventional floodlightcamera. Yang et al. [29] proposed an
underwater laser weak target detection method based on Gabor transform, which is processed by the
non-stationary signal of the underwater complex of the laser intoan approximate stationary signal, and
then the triple correlation is calculated by the Gabor transform coefficient. , and it can remove the
random interference and compress the correlation of the target signal. Ouyang et al. [30] investigated
the application of light field imaging (LFR) to images taken from a distributed bistatic non-
synchronous laser line scan image using both field overview and non- target line image geometries to
create a multi-perspective image underwater scene. The above methods are based on wavelet
decomposition, statistical methods or by means of laser technology, or color polarization theories, the
results show that the methods are reasonable and effective, but the common weakness is that the
processing is very time consumable, and it is difficult to achieve real time object detection for the
moment.
2.3 Research Gaps

While the literature reflects significant progress in the integration ofimage processing and CNNs with
underwater drones, there are several research gaps that warrant further investigation:
• Robustness in Challenging Conditions : Underwater environments are highly variable, and
conditions can change rapidly. Existing research often focuses on controlled or relatively
stable conditions. To make this technology applicable to a wider range of scenarios, there is a
need for research on enhancing the robustness of image processing and CNN-based object
detection in adverse conditions, such as strong currents, low visibility, and varying water
clarity.
• Real-Time Processing and Decision-Making : Real-time data analysis and decision-making
capabilities are crucial for applications like environmental monitoring and disaster response.
Researchers need to explore methods to optimize CNN models and image processing
algorithms for real-time operation on the limited computational resources of underwater
drones
• Integration with Autonomous Navigation : The full potential of underwater drones can be
realized when they are capable of autonomous navigation. Future research should focuson
integrating image processing and CNN-based object detection with autonomous navigation
systems, enabling drones to make intelligent decisions based on detected objects.
• Multispecies Object Detection : While there has been considerable research on single species
or object detection, there is limited work on the simultaneous detection and classification of
multiple underwater objects or species. A more comprehensive approach to object detection
could significantly benefit marine biologists and environmental researchers.
• User Interface and Operator Interaction: Developing user-friendly interfaces for operators
and scientists is essential to ensure that the technology is accessible and useful. Future
research should consider the design of intuitive control interfaces that allow non-experts to
operate and interpret data fromthese advanced systems.

2.4 Preferences of CNN over other methods

CNNs (Convolutional Neural Networks) offer a distinct advantage inimage classification by obviating
the necessity for manual feature extraction and filtering. The convolution operation performs these
tasks automatically, streamlining the process. As convolution deepens within the network, it
progressively generates more sophisticated semantic-level features conducive to enhanced
classification performance.
CHAPTER 3
METHODOLOGY
3.1 Block Diagram

Power Supply

laptop Raspberry camera Water Body


PI

CNN
Object detection
Image Processing
and classification

Fig :

The block diagram depicts an object detection system for a water body. Here's a breakdown of its
functionality:
* Image Acquisition: A camera submerged in or near the water body captures an image.
* Image Transmission: The captured image is transmitted to a laptop for further processing through
LAN cable.
* Image Preprocessing: Image processing algorithms enhance the image by magnifying it or
sharpening details to improve clarity.
* Object Detection: A Convolutional Neural Network (CNN) on the laptop analyzes the image to
detect objects of interest. CNNs are proficient at image recognition tasks.
* Power Supply: Both the laptop and Raspberry Pi receive power through a USB cable.

Camera Setup
3.2 Hardware Components

• Raspberry Pi 4 model B

The Raspberry Pi 4 Model B is a credit-card sized computer that can be used for a variety of
purposes, including media streaming, gaming, and programming. It was released in 2019 and
features a 1.5 GHz quad-core ARM Cortex-A72 CPU, up to 4GB of RAM, and up to 128GB of
eMMC storage. It has micro HDMI, USB 3.0, and Ethernet ports, and it can run the latest version
of the Raspberry Pi OS, which is based on Debian Linux. It's also compatible with a number of
expansion boards and peripherals, making it a versatile.

Fig:
Specification :
64-bit SoC
1.8GHz
8GB LPDDR4-3200 SDRAM
2.4 GHz and 5.0 GHz
wireless,
Bluetooth 5.0
Gigabit Ethernet
2 USB 3.0 ports; 2 USB 2.0 ports.
Raspberry Pi standard 40 pin GPIO header
2 × micro-HDMI ports (up to 4kp60 supported)
2-lane MIPI DSI display port
2-lane MIPI CSI camera port
4-pole stereo audio and composite video port
H.265 (4kp60 decode), H264 (1080p60 decode, 1080p30 encode)
Micro-SD card slot for loading operating system and data storage
5V DC via USB-C connector
5V DC via GPIO header
Power over Ethernet (PoE) enabled
Operating temperature: 0 – 50 degrees C ambient

• Raspberry pi camera module v2

The Raspberry Pi Camera Module v2 is a high-quality camera that can be used with the
Raspberry Pi computer. It features a 12 MP Sony IMX477 sensor, which captures high-
resolution images and videos. The module also includes a lens that can be adjusted between
0.4x and 1.6x zoom. Additionally, the module supports various interface modes such as CSI-
2, I2C, and SPI, making it versatile for different use cases. Overall, the Raspberry Pi Camera
Module v2 is a reliable and efficient tool for digital imaging and video projects.

Fig :
• Sandisk micro SD card :

A Sandisk microSD card is a type of removable flash storage card that can be used in a
variety of devices, including smartphones, cameras, and mobile drones. The 32GB capacity
provides ample storage for photos, videos, and other files.
• HDMI Cable :
HDMI cable is a high-definition multimedia interface cable that is used to connect devices
such as computers, smartphones, and televisions. It is designed to transmit high-quality audio
and video signals, including 4K Ultra HD, and can also carry control signals, such as those
used for remote controls

Fig :
• Power supply Cable
• Jumper Wire

3.3 Software

Application
Python
Python is a simple to learn, multipurpose high-level programming language. Python’s simple yet
elegant syntax makes it one of the ideal programming languages for rapid application development.
Python is widely considered as a preferred language for developing machine learning and deep
learning algorithms. It has many inbuilt libraries :
Tensorflow: It is an open source end-to-end platform providing flexible libraries, tools, and
resources that help developers to easily build and deploy machine learning applications.
Cython: It is a static compiler which makes writing C extensions for Python easy. It allows
developers to write Python code that can communicate with other programs written in C and
C++.
Pillow: It is a Python library which provides powerful image processing capabilities to the
Python interpreter and also supports various file formats.
lxml: It is a simple Python library used to process HTML and XML in Python .
Jupyter: Its an open source web application allowing developers or researchers to share
documents containing code, data visualizations, text, and equations. This Python library is
generally used for data visualization, data cleaning, data transformation, machine learning,
etc.
Matplotlib: It is a Python plotting library that is capable of generating plots, histograms, bar
charts, scatter plots, heat maps, etc.
Pandas: It is an open source library for Python providing easy-to-use data analysis tools for
Python.
OpenCV: It is an open source computer vision and machine learning library for Python. It
comes with a lot of optimized algorithms which can be used to detect, identify objects, track
moving objects, creating bounding boxes.

Flowchart
3.4 Image Processing
The process of using computer algorithms to analyze and manipulate digital images. It's a type of
signal processing that can improve the quality of an image, or extract useful information from it. The
output of image processing is either the image itself, or its characteristics or features. Image
processing is a preprocessing step for many applications, including: Face recognition, Object
detection, and Image compression. It's like fixing or improving a picture, and it's a bit like working
with signals. The input is an image, and the output can be a better image or some important details
from the image.

Image preprocessing Image enhancement


Image acquisition

Image segmentation Morphological Image restoration


processing

Image Image data compression


recognition

Fig :

The above figure represents step by step process of image processing. This is explained here:
• Image Acquisition : Image acquisition is the process of capturing or obtaining images from
various sources, such as cameras, scanners, medical imaging devices, or satellites. It is the
initial step in the image processing pipeline and is crucial for subsequent analysis and
manipulation.
• Image preprocessing : A crucial step in image processing that involves applying various
techniques to the acquired images before performing further analysis or manipulation.
Preprocessing aims to enhance the quality of images, remove noise, correct distortions, and
prepare them for subsequent tasks such as feature extraction, segmentation, or object
detection. Preprocessing include various techniques to correct, enhance, or prepare images.

1. Image Resizing: Common resizing techniques include cropping and


scaling. Adjust the image dimensions to a
specific size to make it more manageable for processing and analysis.
Pixel values is normalized to a consistent range (e.g., 0 to 1 or -1 to 1)
to ensure uniformity in data representation.

2. Grayscale Conversion: Convert color images to grayscale if color information is


not needed or to reduce data dimensionality.

3. Contrast Enhancement: Adjusting the contrast of an imageto make


details more visible. Common techniques includehistogram
equalization and adaptive contrast stretching.

4. Noise Reduction : Remove noise from images using techniques such


as Gaussian smoothing, median filtering, or bilateral filtering.

5. Image Sharpening: Enhance image details and edges using techniques


like unsharp masking or Laplacian sharpening.

6. Geometric Transformation: Correct geometric distortions in images,


such as perspective transformations or lens distortioncorrection.
Adjusting the orientation of an image by rotating or flipping it as
needed.

7. Color Correction: It is needed to Correct color balance issues, such as


white balance adjustments, to ensure accurate color correction.

8. Image Registration: Aligning multiple images or frames to the same


coordinate system, which is important for tasks like object tracking or
3D reconstruction. Increases the resolution of an image using
techniques like bicubic interpolation or deep learning-based
approaches.

9. Data Augmentation: Generate augmented versions of images by


applying random transformations, such as rotation, translation, and
scaling. Data augmentation is commonly used for trainingmachine
learning models to improve their generalization.

• Image Enhancement : A set of techniques used to improve the visual quality of an


image by adjusting its appearance to make it more suitable for human perception or
for further processing by computer algorithms. The goal of image enhancement is
to highlight certain features, improve contrast, reduce noise, and make details more
distinguishable.

• Image Restoration : Image restoration is a process aimed at improving the visual


quality of an image by removing or reducing various types of degradations, such as
blur, noise, or artifacts. The goal is to recover the original image as accurately as
possible, considering the nature of the degradation and the available information.

• Morphological Processing : It is based on mathematical morphology, which deals


with the analysis and processing of geometric structures within images. These
techniques are primarily used for analyzing and manipulating binary or grayscale
images to extract meaningful information about the shapes, structures, and spatial
relationships of objects in the images.

• Image segmentation : Process of partitioning an image into multiple segments or


regions based on certain criteria, such as pixel intensity, color, texture, or spatial
relationships. The goal of segmentation is to simplify the representation of an
image, making it easier to analyze and extract meaningful information from
specific regions of interest.

• Image recognition : The process of automatically identifying and categorizing


objects or patterns within digital images. It is a fundamental task in computer
vision and has numerous applications across various domains.

• Image data compression : The process of reducing the size of digital images to
minimize storage space or transmission bandwidth while preserving as much visual
quality as possible. Compression techniques aim to remove redundant or irrelevant
information from the image data without significantly affecting the perceived
visual quality.

Fig :
3.5 Data collection and Preprocessing
Data collection in underwater drones with image processing and object detection involves
capturing images or videos underwater, then processing and analyzing the collected data to
detect and identify objects of interest. This is a common practice in various underwater
applications such as marine research, environmental monitoring, and infrastructure
inspection.
Description of the Dataset:
The dataset used for the purpose of object detection is : Yolov5 PyTorch format underwater life
dataset for object detection. It is a pretrained model obtained from [33]
Info : The dataset contains 7 classes of underwater creatures with provided boxes locations for every
animal.
The dataset is already split into the train, validation, and test sets.
Data : It includes 638 images
Creatures are annotated in YOLO v5 PyTorch format
Pre-Processing : The following pre-processing was applied to each image, Auto-orientation of pixel
data (with EXIF-orientation stripping) , Resize to 1024x1024 (Fit within).

classes = ['fish', 'jellyfish', 'penguin', 'puffin', 'shark', 'starfish', 'stingray']


Usability

Data Sources:
Specify where you obtained the data. This could be from publicly available repositories, academic
sources, or collected through your own efforts.
Data Annotation:
If the dataset is annotated (e.g., bounding boxes around objects of interest), describe the annotation
process.
Mention any tools or methodologies used for annotation and the criteria followed for labeling.
Data Diversity:
Discuss the diversity of the dataset in terms of variations in lighting conditions, water clarity, object
types, etc.
Highlight any challenges posed by the variability in the data.
Data Splitting:
Explain how you divided the dataset into training, validation, and test sets.
Mention any considerations taken to ensure the sets are representative and unbiased.

Fig :

A diverse dataset of underwater images was collected from variousunderwater environments,


including oceans, lakes, and controlled conditions in aquariums. These images underwent
preprocessing toenhance quality, remove noise, and standardize the format. Data augmentation
techniques were applied to augment the dataset, making it more diverse and robust.

Data Preprocessing
Data preprocessing for underwater dronesinvolves preparing the collected data,
including sensor readings,images, or other information, to make it suitable for
analysis, storage, and further processing.

1. Data Collection and Logging:


Data from sensors, cameras, and other instruments is collected on
the underwater drone during its mission. Log the data with accurate
timestamps, GPS coordinates,and any relevant metadata.

2. Data Synchronization:
Data from different sensors and sources are synchronized,so they
correspond to the same time and location. This is crucial for multi-
sensor fusion and analysis.

3. Quality Control and Cleaning:


Identify and remove or correct erroneous or noisy data points. This can include
outliers, missing values, or sensorcalibration issues.

4. Data Compression:
Compress data to reduce storage and transmission requirements while
preserving essential information. Thisis important for remote or
resource-constrained environments.

Fig:

3.6 CNN Model

A Convolutional Neural Network (CNN) is a deep learning architecture primarily designed for image
and video analysis. CNNs have revolutionized the field of computer vision and are widely used in
tasks such as image classification, object detection, facial recognition, and more. CNNs are also
applicable to various other domains, including natural language processing and audio analysis. The
CNN architecture employed for object detection was tailored to the underwater environment. The
network was designed to detect specific underwater objects such as marine life, shipwrecks, and
debris, while considering factors like low light conditions, varying water turbidity, and complex
backgrounds.
layers in a CNN architecture :
Convolutional layers: convolutional layers are the building block. This layer is the first layer
that is used to extract the various features from the input images. In this layer, We use a filter
or Kernel method to extract features from the input image.
Pooling layer: The primary aim of this layer is to decrease the size of the convolved feature
map to reduce computational costs. This is performed by decreasing the connections between
layers and independently operating on each feature map. Depending upon the method used,
there are severaltypes of Pooling operations. We have Max pooling and average pooling.
Fully-connected layer: The Fully Connected (FC) layer consists of the weights and biases
along withthe neurons and is used to connect the neurons between two different layers. These
layers are usually placed before the output layer and form thelast few layers of a CNN
Architecture.
Dropout layer: Another typical characteristic of CNNs is a Dropout layer. The Dropout layer
is a mask that nullifies the contribution of some neurons towards the next layer and leaves
unmodified others.
Activation function: An Activation Function decideswhether a neuron should be activated or
not. This means that it will decide whether the neuron’s inputto the network is important or
not in the process of prediction. There are several commonly used activation functions such
as the ReLU, Softmax, tanH, and the Sigmoid functions. Each of these functions has a
specific usage.
Hyperparameters:
Number of layers: The number of convolutional, pooling, and fully-connected layers can
significantly impact the model's complexity and performance.
Filter size: The size of the filters in convolutional layers determines the level of detail they
can capture.
Number of filters: The number of filters in each convolutional layer determines the number
of features extracted.
Pooling type: There are various pooling techniques, such as max pooling and average
pooling, each affecting how features are downsampled.
Fig : YOLOv5 Layers

Explanation of the chosen CNN architecture(s) for object detection:


YOLOv5 stands for "You Only Look Once version 5." It's a popular
convolutional neural network (CNN) architecture designed specifically for
object detection tasks. YOLOv5 builds upon the success of previous versions
(YOLOv1, YOLOv2, YOLOv3) and introduces improvements in terms of
speed, accuracy, and efficiency.

Description of the layers and their functions:


Backbone Network (CSPDarknet53): YOLOv5 utilizes a CSPDarknet53
backbone network, which is a variant of the Darknet architecture.
CSPDarknet53 incorporates Cross-Stage Partial connections to improve
gradient flow and promote feature reuse, enhancing the network's
representational capacity.
Neck: YOLOv5 incorporates a series of additional layers after the backbone
network to extract and aggregate features from different scales. This typically
includes the PANet (Path Aggregation Network) module, which facilitates
feature fusion across different levels of abstraction.
Detection Head: The detection head of YOLOv5 comprises multiple
convolutional layers followed by a final detection layer. This detection layer
predicts bounding boxes, confidence scores, and class probabilities for multiple
objects simultaneously.
Justification for selecting the specific architecture(s):
YOLOv5 was chosen for its balance between speed and accuracy. Compared to
previous versions, YOLOv5 achieves better performance in terms of both
accuracy and inference speed, making it suitable for real-time applications.
Additionally, YOLOv5 has a relatively simple and straightforward architecture,
which makes it easier to train and deploy compared to more complex
architectures. Furthermore, the improvements introduced in YOLOv5, such as
the CSPDarknet53 backbone and PANet module, enhance its ability to capture
semantic information and handle objects of varying scales and aspect ratios
effectively. Overall, YOLOv5 represents a state-of-the-art solution for object
detection tasks, offering a good balance between performance and efficiency

Fig :
CHAPTER 4
RESULT And DISCUSSION
CHAPTER 5
CONCLUSION AND FUTURE SCOPE

Conclusion

Developing and deploying an object detection system for underwater images or videos using a
Raspberry Pi involves a series of carefully orchestrated steps. We acquired underwater footage using
raspberry pi camera module V2, preprocessed the data, prepared a dataset which consists of 638
images, selecting and training a YOLO v5 object detection model, and optimizing it for deployment
on the Raspberry Pi, a robust system is created for real-time or offline detection of objects in
underwater environments.
Throughout this process, considerations such as image quality, computational efficiency, and model
performance are paramount. Techniques like color correction, noise reduction, and model
optimization ensure that the system can effectively detect objects despite the challenges posed by
underwater conditions and the Raspberry Pi's limited resources.
Once deployed, the object detection system can be evaluated on real underwater footage, with
metrics such as detection accuracy, speed, and resource usage providing insights into its
performance. Iterative refinement based on testing results and user feedback ensures that the system
remains effective and reliable in real-world underwater environments.
Ultimately, the development and deployment of an object detection system for underwater images or
videos using a Raspberry Pi open up numerous possibilities for applications in marine research,
underwater exploration, and environmental monitoring. With proper calibration and ongoing
maintenance, such systems can contribute significantly to our understanding of underwater
ecosystems and facilitate various scientific and practical endeavors.
Future scope
The project combining an underwater drone with image processing and object detection has a
promising future with applications in various fields. Here are some areas where this technology can
be further developed:
Enhanced autonomy: Machine learning algorithms can be refined to enable the drone to
autonomously navigate underwater environments, identify objects of interest, and make real-time
decisions.
Improved object recognition: By incorporating advanced machine learning techniques like deep
learning, the drone can be trained to recognize a wider variety of objects with higher accuracy, even
in poor lighting conditions.
3D mapping and modeling: By equipping the drone with additional sensors like LiDAR, it can
generate 3D maps and models of the underwater environment, useful for search and rescue
operations or infrastructure inspection.
Environmental monitoring: The drone can be deployed to monitor water quality, track marine life
populations, and detect pollution sources.
Underwater archaeology: The technology can aid archaeologists in exploring underwater ruins and
artifacts in a more efficient and non-invasive manner.
REFERENCES

[1] Mennatullah, Y., et al. (2019). Underwater object detection and


tracking using deep learning for autonomous underwatervehicles. IEEE
Access, 7, 123872-123884.
[2] Cui, R., et al. (2019). Underwater object detection: A benchmark
dataset and evaluation. IEEE Robotics and Automation Letters, 4(4),
3753-3760.
[3] Iqbal, M. S., et al. (2018). A survey of underwater image enhancement
techniques. Journal of Sensors, 2018, 1-29.
[4] Ferrari, R., et al. (2018). Oceanography: Autonomous and Lagrangian
techniques for observing oceanic processes in coastal waters.
Oceanography, 31(1), 22-27
[5] Kim, Y., et al. (2015). Real-time object detection for underwater
robots: Comparative study. In 2015 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS) (pp. 2090-
2095).
[6] Cui, R., et al. (2019). Underwater object detection: A benchmark
dataset and evaluation. IEEE Robotics and Automation Letters, 4(4),
3753-3760.
[7] Salvadeo, S., et al. (2017). Underwater search and rescue with AUVs.
In Proceedings of the European Conference on Mobile Robots
(ECMR) (pp. 1-6).
[8] Mennatullah, Y., et al. (2019). Underwater object detection and
tracking using deep learning for autonomous underwatervehicles. IEEE
Access, 7, 123872-123884.
[9] R.C., Woods, R.E., & Eddins, S.L. (2009). Digital Image Processing
Using MATLAB
[10] Pratt, W.K. (2007). Digital Image Processing: PIKS Scientific Inside,
4th Edition.
[11] Reference: González, R.C., & Woods, R.E. (2008). Digital Image
Processing, 3rd Edition
[12] Pizer, S.M., et al. (1987). Adaptive histogram equalization and its
variations. Computer Vision, Graphics, and ImageProcessing.
[13] Landini, G., & Randell, D.A. (2011). Digital Retinex. In Image
Processing for Computer Graphics and Vision
[14] He, K., Sun, J., & Tang, X. (2011). Single Image Haze Removal Using
Dark Channel Prior.
[15] Lee, Z., & Kuan, S. (2005). Correction of sun glint effects on marine
satellite imagery: I. theory. Journal of Geophysical Research: Oceans.
[16] Mallat, S. (1998). A Wavelet Tour of Signal Processing
[17] Hall, D.L., & Llinas, J. (2001). An Introduction to Multisensor Data
Fusion.
[18] J. L. Forand, G. R. Fournier, D. Bonnier, and P. Pace, “LUCIE:a Laser
Underwater Camera Image Enhancer,” in Proceedings of
OCEANS '93, Victoria, BC, Canada, Canada, Oct. 1993.
[19] B. Ouyang, F. Dalgleish, A. Vuorenkoski, W. Britton, B. Ramos, and B.
Metzger, “Visualization and image enhancement for multistatic
underwater laser line scan system using image-based rendering,”
IEEE Journal of Oceanic Engineering, vol. 38, no. 3, pp. 566–580,
2013.
[20] Corrigan David, Sooknanan Ken, Doyle Jennifer, Lordan Colm,
Kokaram Anil A low-complexity Mosaicing algorithm for stock
assessment of seabed-burrowing species
[21] V. Gruev, J. V. D. Spiegel, and N. Engheta, “Advances in integrated
polarization image sensors,” in 2009 IEEE/NIH Life Science Systems
and Applications Workshop, pp. 62–65, Bethesda, MD, USA, Apr
2009.

[22] Y. Li and S. Wang, “Underwater polarization imaging technology,” in


2009 Conference on Lasers & Electro Optics & The Pacific Rim
Conference on Lasers and Electro-Optics, pp. 1-2, Shanghai, China,
Aug 2009.
[23] rtificial Computation, pp. 183–192, Springer, Cham, 2017.
[24] C. C. Chang, J. Y. Hsiao, and C. P. Hsieh, “An Adaptive Median Filter
for Image Denoising,” in 2008 Second International Symposium on
Intelligent Information Technology Application, pp. 346–350, Shanghai,
China, Dec. 2008.
[25] C. J. Prabhakar and P. U. P. Kumar, “Underwater image denoising
using adaptive wavelet subband thresholding,” in 2010 International
Conference on Signal and Image Processing, pp. 322–327, Chennai,
India, December 2010.
[26] D. M. Kocak and F. M. Caimi, “The current art of underwater imaging –
with a glimpse of the past and vision of the future,” Marine Technology
Society Journal, vol. 39, no. 3, pp. 5–26, 2005.
[27] M. Zhang and B. K. Gunturk, “Multiresolution bilateral filtering for image
denoising,” IEEE Transactions on Image Processing A Publication of
the IEEE Signal Processing Society, vol. 17, no. 12, pp. 2324–2333,
2008.
[28] M. Mäkitalo and A. Foi, “Optimal inversion of the generalizedAnscombe
transformation for Poisson-Gaussian noise,” IEEE
Transactions on Image Processing, vol. 22, no. 1, pp. 91–103, 2013.
[29] J. L. Forand, G. R. Fournier, D. Bonnier, and P. Pace, “LUCIE:a Laser
Underwater Camera Image Enhancer,” in Proceedings of
OCEANS '93, Victoria, BC, Canada, Canada, Oct. 1993.
[30] S. Yang and F. Peng, “Laser underwater target detection based on
Gabor transform,” in 2009 4th International Conference on Computer
Science & Education, pp. 95–97, Nanning, China, Jul 2009.
[31] B. Ouyang, F. Dalgleish, A. Vuorenkoski, W. Britton, B. Ramos, and B.
Metzger, “Visualization and image enhancement for multistatic
underwater laser line scan system using image-based rendering,”
IEEE Journal of Oceanic Engineering, vol. 38, no. 3, pp. 566–580,
2013.
[32] Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N., and Liang, J. 2018.
Unet++: A nested u-net architecture for medical imagesegmentation. In
Deep Learning in Medical Image Analysis and Multimodal Learning for
Clinical Decision Support. 3-11. Springer, Cham. DOI=
http://dx.doi.org/10.1007/978-3-030- 00889-5_1.
APPENDICES

Code:
This is only training code using detectron2

In [ ]:
!python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
In [ ]:
# detectron2 imports
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
from detectron2 import model_zoo
import cv2
import os
from detectron2.engine import DefaultTrainer
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.structures import BoxMode
from detectron2.data import DatasetCatalog, MetadataCatalog
from detectron2.evaluation import COCOEvaluator, inference_on_dataset, LVISEvaluator
from detectron2.data import build_detection_test_loader
from detectron2.utils.visualizer import ColorMode

# other libs (other necessary imports in Colab file to make the list shorter here)

import torch, torchvision


from pathlib import Path
import torchvision.transforms as transforms
In [ ]:
#coco Data create and register
def create_data_pairs(input_path, detectron_img_path, detectron_annot_path, dir_type =
'train'):

img_paths = Path(input_path +'/train/images/').glob('*.jpg')

pairs = []
for img_path in img_paths:

file_name_tmp = str(img_path).split('/')[-1].split('.')
file_name_tmp.pop(-1)
file_name = '.'.join((file_name_tmp))

label_path = Path(input_path + '/train/labels/' + file_name + '.txt')

if label_path.is_file():

line_img = detectron_img_path + '/train/images/'+ file_name + '.jpg'


line_annot = detectron_annot_path+'/train/labels/' + file_name + '.txt'
pairs.append([line_img, line_annot])

return pairs

input_path = '/kaggle/input/aquarium-data-cots/aquarium_pretrain'

detectron_img_path = '/kaggle/input/aquarium-data-cots/aquarium_pretrain'
detectron_annot_path = '/kaggle/input/aquarium-data-cots/aquarium_pretrain'

train = create_data_pairs(input_path, detectron_img_path, detectron_annot_path, 'train')


val = create_data_pairs(input_path, detectron_img_path, detectron_annot_path, 'valid')
In [ ]:
def create_coco_format(data_pairs):

data_list = []

for i, path in enumerate(data_pairs):

filename = path[0]

img_h, img_w = cv2.imread(filename).shape[:2]

img_item = {}
img_item['file_name'] = filename
img_item['image_id'] = i
img_item['height']= img_h
img_item['width']= img_w

#print(str(i), filename)

annotations = []
with open(path[1]) as annot_file:
lines = annot_file.readlines()
for line in lines:
if line[-1]=="\n":
box = line[:-1].split(' ')
else:
box = line.split(' ')

class_id = box[0]
x_c = float(box[1])
y_c = float(box[2])
width = float(box[3])
height = float(box[4])

x1 = (x_c - (width/2)) * img_w


y1 = (y_c - (height/2)) * img_h
x2 = (x_c + (width/2)) * img_w
y2 = (y_c + (height/2)) * img_h

annotation = {
"bbox": list(map(float,[x1, y1, x2, y2])),
"bbox_mode": BoxMode.XYXY_ABS,
"category_id": int(class_id),
"iscrowd": 0
}
annotations.append(annotation)
img_item["annotations"] = annotations
data_list.append(img_item)
return data_list

train_list = create_coco_format(train)
val_list = create_coco_format(val)
In [ ]:
for catalog_name, file_annots in [("train", train_list), ("val", val_list)]:
DatasetCatalog.register(catalog_name, lambda file_annots = file_annots: file_annots)
MetadataCatalog.get(catalog_name).set(thing_classes=['fish', 'jellyfish', 'penguin',
'puffin', 'shark', 'starfish', 'stingray'])

metadata = MetadataCatalog.get("train")
In [ ]:
# Custom augmentation function
def custom_mapper(dataset_dict):
dataset_dict = copy.deepcopy(dataset_dict)
image = utils.read_image(dataset_dict["file_name"], format="BGR")
transform_list = [T.RandomBrightness(0.5, 1.2),
T.RandomFlip(prob=0.5, horizontal=False, vertical=True),
T.RandomFlip(prob=0.5, horizontal=True, vertical=False)
]
image, transforms = T.apply_transform_gens(transform_list, image)

dataset_dict["image"] = torch.as_tensor(
image.transpose(2, 0, 1).astype("float32"))

annos = [
utils.transform_instance_annotations(obj, transforms, image.shape[:2])
for obj in dataset_dict.pop("annotations")
if obj.get("iscrowd", 0) == 0
]
instances = utils.annotations_to_instances(annos, image.shape[:2])

dataset_dict["instances"] = utils.filter_empty_instances(instances)
return dataset_dict
In [ ]:
linkcode
#training
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("train",)
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.DEVICE = 'cuda' # cuda
cfg.MODEL.WEIGHTS = "detectron2://COCO-
Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl"
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.CHECKPOINT_PERIOD = 750
cfg.SOLVER.WARMUP_ITERS = 500
cfg.SOLVER.GAMMA = 0.05
cfg.SOLVER.BASE_LR = 0.0005
cfg.DATALOADER.AUGMENTATIONS = [("CustomMapper", custom_mapper),]
cfg.SOLVER.MAX_ITER = 3500 # (train_size / batch_size) * 100
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256 # 512
cfg.MODEL.ROI_HEADS.NUM_CLASSES = len(MetadataCatalog.get("train").thing_classes)
cfg.SOLVER.STEPS = (20500, )

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)

import time as t
s1 = t.time()
try:
trainer.train()
except:
None
s2 = t.time()
print(s2-s1)

You might also like