Majorreport 2
Majorreport 2
Bachelor of Technology
In
Electrical Engineering
Submitted by
DECLARATION BY CANDIDATES
We hereby declare that the work which is being presented in this dissertation entitled
“UNDERWATER DRONE WITH IMAGE PROCESSING AND OBJECT DETECTION”,
submitted in partial fulfillment of the requirement for the award of the degree of Bachelor of
Technology in Electrical Engineering, has been carried out at Maulana Azad National Institute
of Technology, Bhopal and is an authentic record of our own work carried out under the esteemed
guidance of Dr. Mukesh Kumar Kirar. The matter embodied in this dissertation, in part or whole
has not been presented or submitted by us for any purpose in any other institute or organization for
the award of any other degree.
We further declare that the facts mentioned above are true to the best of my knowledge. In case of
any unlikely discrepancy that may possibly occur, we will be the one to take responsibility.
This is to certify that the dissertation work entitled “Underwater Drone With Object Detection and
Image Processing” is a bonafide record of the work done by the following students and submitted
in partial fulfillment of the requirements for the award of degree of Bachelor of Technology in
Electrical Engineering To the best of my knowledge and belief, the dissertation embodies the work
of the following candidates. They have duly completed the work and fulfill the requirement of the
ordinance relating to the Bachelor of Technology degree from Department of Electrical
Engineering, Maulana Azad National Institute of Technology, Bhopal, India.
We would like to acknowledge the support provided by Dr. Mukesh Kumar Kirar
in guiding and correcting us at all stages of development of this project with utmost
attention and care. We express our thanks to Dr. Sushma Gupta, Head of the
Department Electrical Engineering, for extending her support and providing us
necessary facilities. We would in particular give our regards to our guide, Dr.
Mukesh Kumar Kirar , for his insightful advice, invaluable guidance, help, support
in successful completion of this project and his consistent encouragement, advices
throughout our project work. It was a great learning experience for us and we
sincerely thank you for giving us this opportunity.
Underwater environments often suffer from reduced visibility due to suspended particles and
turbidity. These conditions hinder traditional image processing and object detection techniques. Cui et
al. (2019) discuss the challenge of underwater image degradation and its impact on object
detection.[2].Water absorbs and scatters light differently from air, leading to a loss of color and
contrast in underwater images. Compensating for these color shifts and achieving accurate color
representation is a significant challenge.
Iqbal et al. (2018) review various techniques to address this challenge.[3] Underwater scenes often
contain complex backgrounds, including coral reefs, seaweed, and rocks.
Distinguishing objects of interest from the cluttered surroundings is challenging and requires
advanced object detection algorithms. This issue is highlighted in Ferrari et al.'s study (2018) on
oceanographic research.[4] Lighting conditions underwater can change rapidly with depth and time of
day. These variations affect the visibility of objects and complicate image processing. Kim et al.
(2015) discuss the importance of handling varying lighting conditions in underwater object
detection.[5]. Building robust object detection models in underwater environments requires access to
large and well- annotated datasets. However, such datasets are often limited.
Efforts are underway to create benchmark datasets for underwaterobject detection, as seen in the work
of Cui et al. (2019).[6] Many underwater applications, such as autonomous navigation and
infrastructure inspection, require real-time object detection and decision-making. Achieving low-
latency processing while maintaining accuracy is a technical challenge. Salvadeo et al. (2017)
discuss this challenge in the context of underwater search and rescue operations [7]. Underwater
drones are often constrained by limited computational resources and power.
Developing efficient algorithms that can run on resource- constrained hardware is essential. This
challenge is discussed in Mennatullah et al.'s research (2019) on underwater object tracking.[8]
Addressing these challenges is essential for improving the capabilities of underwater drones in object
detection and image processing, enabling them to excel in various applications, from scientific
research to industry and search and rescue operations.
2.1 Various Image processing Techniques:
Image processing for underwater drones is a crucial aspect of their operation, as it helps enhance the
quality of images and videos captured in challenging underwater environments. Here are some
various existing techniques used in image processing for underwater drones.
Histogram equalization redistributes pixel values in an image to enhance its contrast and improve
visibility.[9].Contrast stretching expands the range of pixel values to enhance the contrast of
underwater images.[10].Adaptive filters adjust filter parameters based on local image characteristics,
reducing noise and enhancing image details.[11].Gamma correction adjusts the brightness and
contrast of images, often used to compensate for the nonlinear response of underwater cameras.[12]
Retinex-based algorithms aim to separate illumination and reflectance components in an image,
reducing haze and improving color accuracy[13].Dark Channel Prior is used for underwater dehazing
by estimating and removing haze from the images.[14]Color correction algorithms, such as the Lee-
Kuan filter, correct the color distortion introduced by water absorption.[15].Wavelet transform-based
methods decompose the image into different frequency components, allowing for localized
enhancement.[16].Fusion techniques combine images from different sensors or modalities to
improve the overall image quality.[17] Forand et al., [17] designed and built a Laser Underwater
camera Image Enhancer (LUE) system to improve the quality of laser underwater images. According
to the authors, the range of the LUE system is 3-5 times that of a traditional camera equipped with
floodlights.[18].Deep learning models, such as convolutional neural networks (CNNs), can be
trained to enhance underwater images by learning from large datasets. Perez et al.[19] proposed a
deep learning-based underwater image enhancement method, which produced a training dataset
consisting of groups of degraded underwater images and groups of restored underwater images. The
pattern between the damaged underwater images and the recovered underwater images was obtained
from many training sets using the deep learning method used to improve the quality of the
underwater image. Corrigan et al. [20] presented a mosaic method for underwater videos that used
time smoothing before the motion parameters to reduce the noise to a maximum posterior
homography estimation method to obtain a small amount of texture detail. Anwer et al. Gruev et al.
[21] described two approaches to create focal polarization imaging sensors. The first approach
combines polymer polarization filters with an active CMOS pixel sensor and computes polarization
information at the focal plane. Another approach describes the preliminary work of polarization
filters using aluminum nanowires. Measurements from the first prototype polarization image sensor
are discussed in detail and applications to material detection using polarization techniques are
described. The underwater polarization imaging technique is described in detail byLietal.[22
2.2 Object Detection Theories
Underwater detection relies mainly on digital cameras, and image processing is usually used to
improve quality and reduce noise; Contour segmentation methods are commonly used to locate
objects. Several such methods are proposed to implement object detection. For example, Chen Chang
et al. [23] proposed a new image denoising filter based on the standard median filter, which is used
to detect noise and convert the original pixel value to a newer median. Prabhakar et al. [24] proposed
a new noise reduction method to remove additional noise from underwater images, homomorphic
filtering is used to correct uneven light and an anisotropic filter is used for smoothing. A new
approach to denoising is used to enhance underwater images by combining wavelet decomposition
with a high-pass filter (Sun et al., 2011); both low- frequency components of backscatter noise and
uncorrelated high- frequency noise can be effectively suppressed simultaneously.
However, the blurring of the processed image is important based on the wavelet method. Kocak et al.
[25] used an environmental filter to remove noise, the image quality is improved by stretching the
RGB color plane, the atmospheric light is obtained through the dark channel before, and this method
is useful for images with low noise. For noisy images, Zhang et al. use the two-way filtering method.
[26], the results are good, but the processing time is very long.
Markku et al. gives the exact unbiased inverse of the generalized Anscombe transform. [27]; the
comparison shows that the methodplays an important role in ensuring an accurate denoising result.
The Laser Underwater Camera Image Enhancer system was designed and built by Forand et al. [28]
to improve underwater laser image quality, and it was demonstrated that the operating system of the
system is 3-5 times greater than that of a conventional floodlightcamera. Yang et al. [29] proposed an
underwater laser weak target detection method based on Gabor transform, which is processed by the
non-stationary signal of the underwater complex of the laser intoan approximate stationary signal, and
then the triple correlation is calculated by the Gabor transform coefficient. , and it can remove the
random interference and compress the correlation of the target signal. Ouyang et al. [30] investigated
the application of light field imaging (LFR) to images taken from a distributed bistatic non-
synchronous laser line scan image using both field overview and non- target line image geometries to
create a multi-perspective image underwater scene. The above methods are based on wavelet
decomposition, statistical methods or by means of laser technology, or color polarization theories, the
results show that the methods are reasonable and effective, but the common weakness is that the
processing is very time consumable, and it is difficult to achieve real time object detection for the
moment.
2.3 Research Gaps
While the literature reflects significant progress in the integration ofimage processing and CNNs with
underwater drones, there are several research gaps that warrant further investigation:
• Robustness in Challenging Conditions : Underwater environments are highly variable, and
conditions can change rapidly. Existing research often focuses on controlled or relatively
stable conditions. To make this technology applicable to a wider range of scenarios, there is a
need for research on enhancing the robustness of image processing and CNN-based object
detection in adverse conditions, such as strong currents, low visibility, and varying water
clarity.
• Real-Time Processing and Decision-Making : Real-time data analysis and decision-making
capabilities are crucial for applications like environmental monitoring and disaster response.
Researchers need to explore methods to optimize CNN models and image processing
algorithms for real-time operation on the limited computational resources of underwater
drones
• Integration with Autonomous Navigation : The full potential of underwater drones can be
realized when they are capable of autonomous navigation. Future research should focuson
integrating image processing and CNN-based object detection with autonomous navigation
systems, enabling drones to make intelligent decisions based on detected objects.
• Multispecies Object Detection : While there has been considerable research on single species
or object detection, there is limited work on the simultaneous detection and classification of
multiple underwater objects or species. A more comprehensive approach to object detection
could significantly benefit marine biologists and environmental researchers.
• User Interface and Operator Interaction: Developing user-friendly interfaces for operators
and scientists is essential to ensure that the technology is accessible and useful. Future
research should consider the design of intuitive control interfaces that allow non-experts to
operate and interpret data fromthese advanced systems.
CNNs (Convolutional Neural Networks) offer a distinct advantage inimage classification by obviating
the necessity for manual feature extraction and filtering. The convolution operation performs these
tasks automatically, streamlining the process. As convolution deepens within the network, it
progressively generates more sophisticated semantic-level features conducive to enhanced
classification performance.
CHAPTER 3
METHODOLOGY
3.1 Block Diagram
Power Supply
CNN
Object detection
Image Processing
and classification
Fig :
The block diagram depicts an object detection system for a water body. Here's a breakdown of its
functionality:
* Image Acquisition: A camera submerged in or near the water body captures an image.
* Image Transmission: The captured image is transmitted to a laptop for further processing through
LAN cable.
* Image Preprocessing: Image processing algorithms enhance the image by magnifying it or
sharpening details to improve clarity.
* Object Detection: A Convolutional Neural Network (CNN) on the laptop analyzes the image to
detect objects of interest. CNNs are proficient at image recognition tasks.
* Power Supply: Both the laptop and Raspberry Pi receive power through a USB cable.
Camera Setup
3.2 Hardware Components
• Raspberry Pi 4 model B
The Raspberry Pi 4 Model B is a credit-card sized computer that can be used for a variety of
purposes, including media streaming, gaming, and programming. It was released in 2019 and
features a 1.5 GHz quad-core ARM Cortex-A72 CPU, up to 4GB of RAM, and up to 128GB of
eMMC storage. It has micro HDMI, USB 3.0, and Ethernet ports, and it can run the latest version
of the Raspberry Pi OS, which is based on Debian Linux. It's also compatible with a number of
expansion boards and peripherals, making it a versatile.
Fig:
Specification :
64-bit SoC
1.8GHz
8GB LPDDR4-3200 SDRAM
2.4 GHz and 5.0 GHz
wireless,
Bluetooth 5.0
Gigabit Ethernet
2 USB 3.0 ports; 2 USB 2.0 ports.
Raspberry Pi standard 40 pin GPIO header
2 × micro-HDMI ports (up to 4kp60 supported)
2-lane MIPI DSI display port
2-lane MIPI CSI camera port
4-pole stereo audio and composite video port
H.265 (4kp60 decode), H264 (1080p60 decode, 1080p30 encode)
Micro-SD card slot for loading operating system and data storage
5V DC via USB-C connector
5V DC via GPIO header
Power over Ethernet (PoE) enabled
Operating temperature: 0 – 50 degrees C ambient
The Raspberry Pi Camera Module v2 is a high-quality camera that can be used with the
Raspberry Pi computer. It features a 12 MP Sony IMX477 sensor, which captures high-
resolution images and videos. The module also includes a lens that can be adjusted between
0.4x and 1.6x zoom. Additionally, the module supports various interface modes such as CSI-
2, I2C, and SPI, making it versatile for different use cases. Overall, the Raspberry Pi Camera
Module v2 is a reliable and efficient tool for digital imaging and video projects.
Fig :
• Sandisk micro SD card :
A Sandisk microSD card is a type of removable flash storage card that can be used in a
variety of devices, including smartphones, cameras, and mobile drones. The 32GB capacity
provides ample storage for photos, videos, and other files.
• HDMI Cable :
HDMI cable is a high-definition multimedia interface cable that is used to connect devices
such as computers, smartphones, and televisions. It is designed to transmit high-quality audio
and video signals, including 4K Ultra HD, and can also carry control signals, such as those
used for remote controls
Fig :
• Power supply Cable
• Jumper Wire
3.3 Software
Application
Python
Python is a simple to learn, multipurpose high-level programming language. Python’s simple yet
elegant syntax makes it one of the ideal programming languages for rapid application development.
Python is widely considered as a preferred language for developing machine learning and deep
learning algorithms. It has many inbuilt libraries :
Tensorflow: It is an open source end-to-end platform providing flexible libraries, tools, and
resources that help developers to easily build and deploy machine learning applications.
Cython: It is a static compiler which makes writing C extensions for Python easy. It allows
developers to write Python code that can communicate with other programs written in C and
C++.
Pillow: It is a Python library which provides powerful image processing capabilities to the
Python interpreter and also supports various file formats.
lxml: It is a simple Python library used to process HTML and XML in Python .
Jupyter: Its an open source web application allowing developers or researchers to share
documents containing code, data visualizations, text, and equations. This Python library is
generally used for data visualization, data cleaning, data transformation, machine learning,
etc.
Matplotlib: It is a Python plotting library that is capable of generating plots, histograms, bar
charts, scatter plots, heat maps, etc.
Pandas: It is an open source library for Python providing easy-to-use data analysis tools for
Python.
OpenCV: It is an open source computer vision and machine learning library for Python. It
comes with a lot of optimized algorithms which can be used to detect, identify objects, track
moving objects, creating bounding boxes.
Flowchart
3.4 Image Processing
The process of using computer algorithms to analyze and manipulate digital images. It's a type of
signal processing that can improve the quality of an image, or extract useful information from it. The
output of image processing is either the image itself, or its characteristics or features. Image
processing is a preprocessing step for many applications, including: Face recognition, Object
detection, and Image compression. It's like fixing or improving a picture, and it's a bit like working
with signals. The input is an image, and the output can be a better image or some important details
from the image.
Fig :
The above figure represents step by step process of image processing. This is explained here:
• Image Acquisition : Image acquisition is the process of capturing or obtaining images from
various sources, such as cameras, scanners, medical imaging devices, or satellites. It is the
initial step in the image processing pipeline and is crucial for subsequent analysis and
manipulation.
• Image preprocessing : A crucial step in image processing that involves applying various
techniques to the acquired images before performing further analysis or manipulation.
Preprocessing aims to enhance the quality of images, remove noise, correct distortions, and
prepare them for subsequent tasks such as feature extraction, segmentation, or object
detection. Preprocessing include various techniques to correct, enhance, or prepare images.
• Image data compression : The process of reducing the size of digital images to
minimize storage space or transmission bandwidth while preserving as much visual
quality as possible. Compression techniques aim to remove redundant or irrelevant
information from the image data without significantly affecting the perceived
visual quality.
Fig :
3.5 Data collection and Preprocessing
Data collection in underwater drones with image processing and object detection involves
capturing images or videos underwater, then processing and analyzing the collected data to
detect and identify objects of interest. This is a common practice in various underwater
applications such as marine research, environmental monitoring, and infrastructure
inspection.
Description of the Dataset:
The dataset used for the purpose of object detection is : Yolov5 PyTorch format underwater life
dataset for object detection. It is a pretrained model obtained from [33]
Info : The dataset contains 7 classes of underwater creatures with provided boxes locations for every
animal.
The dataset is already split into the train, validation, and test sets.
Data : It includes 638 images
Creatures are annotated in YOLO v5 PyTorch format
Pre-Processing : The following pre-processing was applied to each image, Auto-orientation of pixel
data (with EXIF-orientation stripping) , Resize to 1024x1024 (Fit within).
Data Sources:
Specify where you obtained the data. This could be from publicly available repositories, academic
sources, or collected through your own efforts.
Data Annotation:
If the dataset is annotated (e.g., bounding boxes around objects of interest), describe the annotation
process.
Mention any tools or methodologies used for annotation and the criteria followed for labeling.
Data Diversity:
Discuss the diversity of the dataset in terms of variations in lighting conditions, water clarity, object
types, etc.
Highlight any challenges posed by the variability in the data.
Data Splitting:
Explain how you divided the dataset into training, validation, and test sets.
Mention any considerations taken to ensure the sets are representative and unbiased.
Fig :
Data Preprocessing
Data preprocessing for underwater dronesinvolves preparing the collected data,
including sensor readings,images, or other information, to make it suitable for
analysis, storage, and further processing.
2. Data Synchronization:
Data from different sensors and sources are synchronized,so they
correspond to the same time and location. This is crucial for multi-
sensor fusion and analysis.
4. Data Compression:
Compress data to reduce storage and transmission requirements while
preserving essential information. Thisis important for remote or
resource-constrained environments.
Fig:
A Convolutional Neural Network (CNN) is a deep learning architecture primarily designed for image
and video analysis. CNNs have revolutionized the field of computer vision and are widely used in
tasks such as image classification, object detection, facial recognition, and more. CNNs are also
applicable to various other domains, including natural language processing and audio analysis. The
CNN architecture employed for object detection was tailored to the underwater environment. The
network was designed to detect specific underwater objects such as marine life, shipwrecks, and
debris, while considering factors like low light conditions, varying water turbidity, and complex
backgrounds.
layers in a CNN architecture :
Convolutional layers: convolutional layers are the building block. This layer is the first layer
that is used to extract the various features from the input images. In this layer, We use a filter
or Kernel method to extract features from the input image.
Pooling layer: The primary aim of this layer is to decrease the size of the convolved feature
map to reduce computational costs. This is performed by decreasing the connections between
layers and independently operating on each feature map. Depending upon the method used,
there are severaltypes of Pooling operations. We have Max pooling and average pooling.
Fully-connected layer: The Fully Connected (FC) layer consists of the weights and biases
along withthe neurons and is used to connect the neurons between two different layers. These
layers are usually placed before the output layer and form thelast few layers of a CNN
Architecture.
Dropout layer: Another typical characteristic of CNNs is a Dropout layer. The Dropout layer
is a mask that nullifies the contribution of some neurons towards the next layer and leaves
unmodified others.
Activation function: An Activation Function decideswhether a neuron should be activated or
not. This means that it will decide whether the neuron’s inputto the network is important or
not in the process of prediction. There are several commonly used activation functions such
as the ReLU, Softmax, tanH, and the Sigmoid functions. Each of these functions has a
specific usage.
Hyperparameters:
Number of layers: The number of convolutional, pooling, and fully-connected layers can
significantly impact the model's complexity and performance.
Filter size: The size of the filters in convolutional layers determines the level of detail they
can capture.
Number of filters: The number of filters in each convolutional layer determines the number
of features extracted.
Pooling type: There are various pooling techniques, such as max pooling and average
pooling, each affecting how features are downsampled.
Fig : YOLOv5 Layers
Fig :
CHAPTER 4
RESULT And DISCUSSION
CHAPTER 5
CONCLUSION AND FUTURE SCOPE
Conclusion
Developing and deploying an object detection system for underwater images or videos using a
Raspberry Pi involves a series of carefully orchestrated steps. We acquired underwater footage using
raspberry pi camera module V2, preprocessed the data, prepared a dataset which consists of 638
images, selecting and training a YOLO v5 object detection model, and optimizing it for deployment
on the Raspberry Pi, a robust system is created for real-time or offline detection of objects in
underwater environments.
Throughout this process, considerations such as image quality, computational efficiency, and model
performance are paramount. Techniques like color correction, noise reduction, and model
optimization ensure that the system can effectively detect objects despite the challenges posed by
underwater conditions and the Raspberry Pi's limited resources.
Once deployed, the object detection system can be evaluated on real underwater footage, with
metrics such as detection accuracy, speed, and resource usage providing insights into its
performance. Iterative refinement based on testing results and user feedback ensures that the system
remains effective and reliable in real-world underwater environments.
Ultimately, the development and deployment of an object detection system for underwater images or
videos using a Raspberry Pi open up numerous possibilities for applications in marine research,
underwater exploration, and environmental monitoring. With proper calibration and ongoing
maintenance, such systems can contribute significantly to our understanding of underwater
ecosystems and facilitate various scientific and practical endeavors.
Future scope
The project combining an underwater drone with image processing and object detection has a
promising future with applications in various fields. Here are some areas where this technology can
be further developed:
Enhanced autonomy: Machine learning algorithms can be refined to enable the drone to
autonomously navigate underwater environments, identify objects of interest, and make real-time
decisions.
Improved object recognition: By incorporating advanced machine learning techniques like deep
learning, the drone can be trained to recognize a wider variety of objects with higher accuracy, even
in poor lighting conditions.
3D mapping and modeling: By equipping the drone with additional sensors like LiDAR, it can
generate 3D maps and models of the underwater environment, useful for search and rescue
operations or infrastructure inspection.
Environmental monitoring: The drone can be deployed to monitor water quality, track marine life
populations, and detect pollution sources.
Underwater archaeology: The technology can aid archaeologists in exploring underwater ruins and
artifacts in a more efficient and non-invasive manner.
REFERENCES
Code:
This is only training code using detectron2
In [ ]:
!python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
In [ ]:
# detectron2 imports
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
from detectron2 import model_zoo
import cv2
import os
from detectron2.engine import DefaultTrainer
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.structures import BoxMode
from detectron2.data import DatasetCatalog, MetadataCatalog
from detectron2.evaluation import COCOEvaluator, inference_on_dataset, LVISEvaluator
from detectron2.data import build_detection_test_loader
from detectron2.utils.visualizer import ColorMode
# other libs (other necessary imports in Colab file to make the list shorter here)
pairs = []
for img_path in img_paths:
file_name_tmp = str(img_path).split('/')[-1].split('.')
file_name_tmp.pop(-1)
file_name = '.'.join((file_name_tmp))
if label_path.is_file():
return pairs
input_path = '/kaggle/input/aquarium-data-cots/aquarium_pretrain'
detectron_img_path = '/kaggle/input/aquarium-data-cots/aquarium_pretrain'
detectron_annot_path = '/kaggle/input/aquarium-data-cots/aquarium_pretrain'
data_list = []
filename = path[0]
img_item = {}
img_item['file_name'] = filename
img_item['image_id'] = i
img_item['height']= img_h
img_item['width']= img_w
#print(str(i), filename)
annotations = []
with open(path[1]) as annot_file:
lines = annot_file.readlines()
for line in lines:
if line[-1]=="\n":
box = line[:-1].split(' ')
else:
box = line.split(' ')
class_id = box[0]
x_c = float(box[1])
y_c = float(box[2])
width = float(box[3])
height = float(box[4])
annotation = {
"bbox": list(map(float,[x1, y1, x2, y2])),
"bbox_mode": BoxMode.XYXY_ABS,
"category_id": int(class_id),
"iscrowd": 0
}
annotations.append(annotation)
img_item["annotations"] = annotations
data_list.append(img_item)
return data_list
train_list = create_coco_format(train)
val_list = create_coco_format(val)
In [ ]:
for catalog_name, file_annots in [("train", train_list), ("val", val_list)]:
DatasetCatalog.register(catalog_name, lambda file_annots = file_annots: file_annots)
MetadataCatalog.get(catalog_name).set(thing_classes=['fish', 'jellyfish', 'penguin',
'puffin', 'shark', 'starfish', 'stingray'])
metadata = MetadataCatalog.get("train")
In [ ]:
# Custom augmentation function
def custom_mapper(dataset_dict):
dataset_dict = copy.deepcopy(dataset_dict)
image = utils.read_image(dataset_dict["file_name"], format="BGR")
transform_list = [T.RandomBrightness(0.5, 1.2),
T.RandomFlip(prob=0.5, horizontal=False, vertical=True),
T.RandomFlip(prob=0.5, horizontal=True, vertical=False)
]
image, transforms = T.apply_transform_gens(transform_list, image)
dataset_dict["image"] = torch.as_tensor(
image.transpose(2, 0, 1).astype("float32"))
annos = [
utils.transform_instance_annotations(obj, transforms, image.shape[:2])
for obj in dataset_dict.pop("annotations")
if obj.get("iscrowd", 0) == 0
]
instances = utils.annotations_to_instances(annos, image.shape[:2])
dataset_dict["instances"] = utils.filter_empty_instances(instances)
return dataset_dict
In [ ]:
linkcode
#training
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("train",)
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.DEVICE = 'cuda' # cuda
cfg.MODEL.WEIGHTS = "detectron2://COCO-
Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl"
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.CHECKPOINT_PERIOD = 750
cfg.SOLVER.WARMUP_ITERS = 500
cfg.SOLVER.GAMMA = 0.05
cfg.SOLVER.BASE_LR = 0.0005
cfg.DATALOADER.AUGMENTATIONS = [("CustomMapper", custom_mapper),]
cfg.SOLVER.MAX_ITER = 3500 # (train_size / batch_size) * 100
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256 # 512
cfg.MODEL.ROI_HEADS.NUM_CLASSES = len(MetadataCatalog.get("train").thing_classes)
cfg.SOLVER.STEPS = (20500, )
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
import time as t
s1 = t.time()
try:
trainer.train()
except:
None
s2 = t.time()
print(s2-s1)