Computer Vision Technology
Computer Vision Technology
Computer Vision is a field within artificial intelligence (AI) focused on enabling machines to
interpret visual information from the world. This interpretation can involve recognizing
objects, analyzing movements, or extracting patterns to make decisions based on visual data.
By mimicking human vision capabilities, computer vision aims to bridge the gap between
raw data in images or videos and actionable insights.
Image Processing, on the other hand, deals with manipulating images to enhance their quality
or extract meaningful information. It involves techniques such as filtering, sharpening, noise
reduction, and segmentation. While image processing often serves as a preprocessing step for
computer vision tasks, it has its standalone applications, such as in medical imaging and
remote sensing.
Together, these fields form the basis of numerous advanced technologies, leveraging
computational models to analyze and interpret visual data effectively.
Artificial Intelligence (AI) and Neural Networks: The concept of AI and neural networks
emerged in the 1950s, laying the foundation for computer vision.First Image Processing
Systems: In the 1950s and 1960s, the first image processing systems were developed,
primarily for military and space applications.Robots and Machine Vision: In the 1960s,
robots and machine vision systems began to emerge, with applications in manufacturing and
inspection.
Rule-Based Approaches (1970s-1980s)
Rule-Based Systems: In the 1970s and 1980s, rule-based systems were developed for
computer vision, using hand-crafted rules to analyze images.Expert Systems: Expert systems
were also developed during this period, using knowledge-based approaches to solve
computer vision problems.Image Processing Techniques: Various image processing
techniques, such as edge detection and thresholding, were developed during this period.
• Statistical Methods: In the 1990s, statistical methods, such as Bayesian networks and
probabilistic graphical models, began to be applied to computer vision.
• Feature Extraction: Feature extraction techniques, such as SIFT and SURF, were
developed during this period.
Image Classification
The problem of image classification goes like this: Given a set of images that are all labeled
with a single category, we’re asked to predict these categories for a novel set of test images
and measure the accuracy of the predictions. There are a variety of challenges associated with
this task, including viewpoint variation, scale variation, intra-class variation, image
deformation, image occlusion, illumination conditions, and background clutter.
fig1.1
The most popular architecture used for image classification is Convolutional Neural
Networks (CNNs). A typical use case for CNNs is where you feed the network images and
the network classifies the data. CNNs tend to start with an input “scanner” which isn’t
intended to parse all the training data at once. For example, to input an image of 100 x 100
pixels, you wouldn’t want a layer with 10,000 nodes.
Object Detection
The task to define objects within images usually involves outputting bounding boxes and
labels for individual objects. This differs from the classification / localization task by
applying classification and localization to many objects instead of just a single dominant
object. You have 2 classes of object classification, which means object bounding boxes and
nonObject bounding boxes. For example, in car detection, you have to detect all cars in a
given image with their bounding boxes.
Fig 1.2
Object Tracking
Object Tracking refers to the process of following a specific object of interest, or multiple
objects, in a given scene. It traditionally has applications in video and real-world interactions
where observations are made following an initial object detection. Now, it’s crucial to
autonomous driving systems such as self-driving vehicles from companies like Uber and
Tesla. Object Tracking methods can be divided into 2 categories according to the observation
model: generative method and discriminative method The generative method uses the
generative model to describe the apparent characteristics and minimizes the reconstruction
error to search the object, such as PCA.The discriminative method can be used to distinguish
between the object and the background, its performance is more robust, and it gradually
becomes the main method in tracking. The discriminative method is also referred to as
Tracking-by-Detection, and deep learning belongs to this category. To achieve tracking-by-
detection, we detect candidate objects for all frames and use deep learning to recognize the
wanted object from the candidates. There are 2 kinds of basic network models that can be
used: stacked auto encoders (SAE) and convolutional neural network (CNN).
Semantic Segmentation
Semantic Segmentation
Fig1.4
Central to Computer Vision is the process of segmentation, which divides whole images into
pixel groupings which can then be labelled and classified.Particularly, Semantic Segmentation
tries to semantically understand the role of each pixel in the image For example, in the picture
above, apart from recognizing the person, the road, the cars, the trees, etc., we also have to
delineate the boundaries of each object. Therefore, unlike classification, we need dense pixel-
wise predictions from our models.As with other computer vision tasks, CNNs have had
enormous success on segmentation problems. One of the popular initial approaches was patch
classification through a sliding window, where each pixel was separately classified into classes
using a patch of images around it. This, however, is very inefficient computationally because
we don’t reuse the shared features between overlapping patches
Instance Segmentation
Fig1.5
2. FUNDAMENTALS OF COMPUTER VISION
Feature detection is the process of identifying and locating interest points or features within
an image. These features can be corners, edges, blobs, or other patterns that are useful for
tasks like object recognition, tracking, and matching.
❖ Corner detector
Example includes ,
• FAST
• ORB
❖ Edge detector
Edge detection is an image processing technique for finding the boundaries of objects within
images. It works by detecting discontinuities in brightness. Edge detection is used for image
segmentation and data extraction in areas such as image processing, computer vision, and
machine vision.
Examples includes,
• Cany edge detector
• Sobel operator
❖ Blob detector
blob detection refers to modules that are aimed at detecting points and/or
regions in the image that differ in properties like brightness or color compared
to the surrounding. It is used to obtain regions of interest for further processing.
These regions could signal the presence of objects or parts of objects in the
image domain with application to object recognition and/or object tracking
Blob detection is usually done after color detection and noise reduction to
finally find the required object from the image.
Object recognition is a key technology behind driverless cars, enabling them to recognize
a stop sign or to distinguish a pedestrian from a lamppost. It is also useful in a variety of
applications such as disease identification in bioimaging, industrial inspection, and
robotic vision.
You can use a variety of approaches for object recognition. Recently, techniques in
machine learning and deep learning have become popular approaches to object
recognition problems. Both techniques learn to identify objects in images, but they differ
in their execution.
Fig2.4 machine learning and deep learning techniques for object recognition
The following section explains the differences between machine learning and deep
learning for object recognition, and it shows how to implement both techniques.
Determining the best approach for object recognition depends on your application and the
problem you’re trying to solve. In many cases, machine learning can be an effective
technique, especially if you know which features or characteristics of theimage are the
best ones to use to differentiate classes of objects.
The main consideration to keep in mind when choosing between machine learning and
deep learning is whether you have a powerful GPU and lots of labeled training images. If
the answer to either of these questions is No, a machine learning approach might be the
best choice. Deep learning techniques tend to work better with more images, and a GPU
helps to decrease the time needed to train the model.
. Fig2.5 Key factors for choosing between deep learning and machine
learning.
2.6 Image Classification
1.Binary Classification
Binary classification involves classifying images into one of two categories. For
example, determining whether an image contains a cat or not. This is the simplest form of
image classification.
2. Multiclass Classification
Multiclass classification involves categorizing images into more than two classes. For
instance, classifying images of different types of animals (cats, dogs, birds, etc.). Each
image is assigned to one, and only one, category.
3. Multilabel Classification
4. Hierarchical Classification
3.1 Introduction
Computer Vision (CV) is changing how machines perceive and interpret visual
information.Computer vision enables machines to “see” and understand the world, much
like the human visual system.At the heart of this transformative field are specialized
software tools known as computer vision libraries.Computer vision libraries are essential
for developers and researchers who need visual data in various applications.These
libraries provide a collection of pre-built algorithms, functions, and tools, simplifying the
complex image and video analysis process.Their significance lies in their ability to
address a wide range of tasks, from facial recognition to object detection, making
computer vision accessible to diverse industriFrom surveillance systems enhancing
security to autonomous vehicles navigating roads, computer visionplaya major role in
shaping the future of technologyLet’s explore some popular computer vision libraries,
each contributing uniquely to the advancement of visual intelligence.We will examine
their features, use cases, and the impact they have industries ranging from healthcare to
robotics.
3.2 OpenCV
OpenCV (Open Source Computer Vision Library) is a popular open-source library for
computer vision and image processing tasks. It offers tools for face detection image
recognition, and object tracking.
Developers use OpenCV for applications like augmented reality, robotics, and medical
imaging. It supports multiple programming languages like Python, C++, and Java, and
works on various platforms, including Windows, macOS, and Linux.
Features:
▪ Cross-platform compatibility.
Applications:
3.3 TensorFlow
TensorFlow is widely used in computer vision projects due to its advanced deep
learning capabilities and scalability. It works on different platforms, including
mobile devices and edge computing.
Advantages:
Applications:
Used for training CNNs, object detection models like SSD and YOLO, and
advanced architectures like Vision Transformers (ViTs).
3.4 Pytorch
Applications :
1. MATLAB
Use Cases:
Advantages:
Limitations:
Scikit-Image is a Python library for image processing that is built on top of SciPy
and is part of the larger Scikit-learn ecosystem. It provides a range of algorithms for
image analysis and manipulation.
Features:
• Edge Detection and Feature Extraction: Offers built-in algorithms for tasks
such as edge detection, corner detection, and texture analysis.
Use Cases:
Advantages:
• Excellent for tasks that don’t require deep learning but require robust
traditional image processing.
• Limitations:
❖ Tumors
❖ Cancer
❖ Stroke
❖ Pneumonia
❖ COVID
Fig 4.1 CT Images of pulmonary nodules (lungs) with (A) AI lung cancer prediction score of
10 and (B) AI lung cancer prediction score of 2. Image source: RSNA.org (Radiological
Society of North America)
Applications
2. Treatment Planning: Medical imaging helps plan treatments, such as surgery and
radiation therapy.
3. Monitoring: Medical imaging helps monitor the progression of diseases and the
effectiveness of treatments.
4. Screening: Medical imaging helps screen for diseases, such as breast cancer and
lung cancer.
Camera-based Vision Systems: Detect vehicles, pedestrians, and road signs using
models like YOLO or SSD.
➢ LiDAR and Radar Integration: Combines data for accurate depth perception
and collision avoidance.
➢ Lane Detection
Vision algorithms identify lane boundaries and road markings. Hough Transform
and deep learning models are often used.
1.Biometric Authentication
Used in smartphones and security systems to authenticate users. Deep models extract
unique facial features to create embeddings for matching.
2.Surveillance
Facial recognition systems deployed in public spaces monitor for individuals of interest.
Ethical considerations include privacy and potential misuse.
AR focuses on augmenting the physical world with digital artifacts, images, videos, or
experiences overlayed with computer-generated images (CGI) and 3D models. VR aims to
create a virtual experience with headsets and tracking to place the user in a different world.
With the increasing usage of AR and VR in gaming, marketing, education, and health care,
it’s important to understand each technology’s uses, applications, advantages, and
disadvantages.
AR Applications
AR systems rely on computer vision for object tracking and scene understanding. Examples
include AR filters on social media platforms and navigation apps.
VR Environments
In VR, depth estimation and 3D reconstruction create immersive experiences for gaming,
training, and virtual tours.
5.CHALLENGES IN COMPUTER VISION
Computer vision is changing how machines understand images, but it faces several
challenges, including ensuring data quality, processing data quickly, the effort
needed for labeling data, scaling, and addressing privacy and ethical issues.
Addressing these challenges effectively will ensure computer vision’s advancement
aligns with both tech progress and human values.
This addresses the clarity and condition of input images or videos, crucial for
system accuracy. Specific challenges include poor lighting, obscured details, object
variations, and cluttered backgrounds. Enhancing input quality is vital for the
accuracy and reliability of computer vision systems:
5.3 Scalability
These issues highlight the need for careful handling of surveillance and
facial recognition to safeguard privacy. Solving these challenges requires
clear rules for data use, openness about technology applications, and
legal support:
The field of computer vision will be among those exploring its potential.
Over the next 12 months, we can expect to see Generative AI further
enable synthetic data creation.
The output data from generative models can be used to train computer
vision models, such as those for object detection or facial recognition.
Not only will this minimize the risk of privacy violations, but make the
model training process significantly less expensive and time-consuming.
This is because it labels training data faster and more efficiently than
humans.
Fig 6.2
Self-driving cars are high on the list of key use cases for computer vision
technology. Currently, the technology used to navigate and operate
autonomous vehicles relies on processing input from various sources,
ranging from cameras and GPS to RADAR and LiDAR.
But as they become more prevalent, it’s only a matter of time before the
computers in these vehicles can drive almost entirely by sight, the same
way a human driver does. To this end, we can expect the incorporation of
increasingly sophisticated computer vision technology into the design
and production line process, as self-driving vehicles edge ever closer to
becoming an everyday reality on our roads.
fig6.4
5. Advances in 3D Computer Vision
fig6.5
7 CONCLUSION