Module 5 and 6
Module 5 and 6
o The ability of a robot to "see" and interpret its environment using cameras and
computer algorithms.
• Why is it Important?
o Line Descriptors:
o Area Descriptors:
• Definition: The process of partitioning an image into multiple segments (sets of pixels),
typically to locate objects or boundaries.
1. Thresholding:
2. Region-Based Segmentation:
▪ Groups pixels into regions based on similarity (e.g., intensity, color, texture).
▪ Application for Obstacle Detection: For different floor surfaces (e.g., carpet
vs. tile), region growing can group pixels belonging to a specific floor texture,
identifying anything that doesn't match as a potential obstacle. It's more
robust to varied backgrounds than simple thresholding.
• Steps:
1. Thresholding:
▪ All pixels with intensity > 150 are set to white (representing potential
obstacles).
▪ All pixels with intensity <= 150 are set to black (representing background).
▪ Region labeling assigns a unique label (ID) to each distinct connected white
region.
▪ This allows the robot to identify each individual obstacle as a separate entity,
even if they are close together.
▪ The robot can then analyze the size, shape, and position of each labeled
region to understand and navigate around multiple obstacles.
o Repeated application of a smoothing filter to gradually reduce noise and blur edges.
o Algorithms that refine a threshold value over multiple iterations until an optimal
separation between foreground and background is achieved.
o Reduces random noise (e.g., sensor noise, "salt-and-pepper" noise) that can obscure
features.
o Smoothes out minor imperfections, making edges and features more discernible.
• Example:
o Applying iterative Gaussian smoothing (e.g., 3x3 Gaussian kernel applied 3-5 times)
will progressively average out the noisy pixels.
o This will result in a much cleaner, more continuous lane line, making it easier for the
robot's lane detection algorithm to accurately identify and follow the lane. Without
smoothing, the robot might misinterpret the noisy data, leading to incorrect
navigation.
• Definition: Inverting pixel intensity means turning light pixels dark and dark pixels light.
• Pixel Function:
o Let Pinverted(x,y) be the intensity of the corresponding pixel in the inverted image.
o For an 8-bit grayscale image, where pixel intensities range from 0 (black) to 255
(white): Pinverted(x,y)=255−Poriginal(x,y)
• Example:
o If an original pixel has an intensity of 0 (black), the inverted pixel will be 255−0=255
(white).
o If an original pixel has an intensity of 200 (light gray), the inverted pixel will be
255−200=55 (dark gray).
• Working Principle:
o The "Shrink" operator reduces the size of foreground objects (usually white pixels)
and enlarges holes or breaks thin connections.
o It works by moving a structuring element (a small kernel, e.g., a 3x3 square) across
the binary image.
o A pixel in the output image is set to foreground (white) only if all the pixels under the
structuring element in the input image are foreground pixels. Otherwise, it's set to
background (black).
• Example:
o Original:
o 00000
o 01110
o 01110
o 01110
o 00000
▪ Only the center pixel of the original square (the '1' at row 2, col 2) will have
all its 3x3 neighbors as '1's.
o 00000
o 00000
o 00100
o 00000
o 00000
o The square has shrunk to a single pixel. Thin lines or noisy "speckles" would be
completely removed. Useful for removing small noise and separating loosely
connected objects.
• Binary Images: Images composed of only two pixel values, typically 0 (black/background)
and 1 (white/foreground).
• 4-Connectivity:
o Two foreground pixels are 4-connected if they are adjacent horizontally or vertically.
o Example:
o X
o X1X
o X
• 8-Connectivity:
o Two foreground pixels are 8-connected if they are adjacent horizontally, vertically, or
diagonally.
o Example:
o XXX
o X1X
o XXX
o Path Planning: When a robot maps a room, it often converts the environment into a
binary occupancy grid (obstacles as '1', free space as '0').
o Obstacle Avoidance: Connectivity helps the robot understand which parts of the
map are contiguous free space and which are connected obstacles.
o Path Generation: If a robot needs to find a path from point A to point B, it needs to
know if the pixels representing the path are connected.
• Front Lighting:
o Description: Light source is placed on the same side as the camera, illuminating the
front surface of the object.
o Purpose: To reveal surface features, textures, colors, and defects on the visible
surface.
o Example:
• Back Lighting:
o Description: Light source is placed behind the object, with the camera facing the
front. The object appears as a silhouette.
o Example:
• Explanation:
o When illuminated from the front with a diffuse light, the crack will often create a
shadow or a distinct change in reflection compared to the smooth surrounding
surface.
o Structured light (e.g., projecting a pattern of lines or grids) can be even more
effective. Cracks will deform the projected pattern, making them highly visible to the
vision system.
o Backlighting would only show the overall silhouette and wouldn't reveal the subtle
surface irregularities of a crack unless it was a through-hole.
o Power Efficiency: Transmitting less data consumes less power, extending the robot's
battery life, crucial for autonomous operation.
▪ Best for: Photographic images with continuous tones and complex textures.
Achieves high compression ratios.
▪ Working: Uses Discrete Cosine Transform (DCT) to convert image data into
frequency components, then quantizes and encodes.
▪ Best for: Images with large areas of uniform color or repeating patterns,
such as binary images, simple graphics, or scanned documents.
▪ Working: Replaces sequences of identical data values with a single value and
a count of how many times it repeats (e.g., "AAAAA" becomes "5A").
14: 12 & 13. Vision System for Sorting Metal and Plastic Components
o Setup:
o Techniques:
▪ Color-Based Segmentation:
▪ Principle: Metal and plastic often have distinct color properties.
▪ Implementation:
▪ Edge Detection:
▪ Implementation:
o Decision Logic:
▪ The robot then uses this classification to actuate a sorting mechanism (e.g., a
robotic arm for pick-and-place into designated bins).
o Feedback Loop: Using visual data to refine robot movements and actions.
• Components:
o Camera(s):
▪ Types: Monocular (2D), Stereo (3D depth), RGB-D (depth sensors like
Kinect), Thermal, Hyperspectral.
o Illumination System:
o Optics (Lenses):
▪ Purpose: Focus light onto the camera sensor, control field of view, minimize
distortion.
▪ Purpose: Hardware interface that digitizes analog video signals (if applicable)
or transfers digital image data from the camera to the computer's memory.
o Processor / Computer:
▪ Types: Industrial PC, embedded system (e.g., NVIDIA Jetson, Raspberry Pi),
dedicated vision controller.
▪ Purpose: Runs the image processing algorithms and controls the robot.
• Navigation Tasks:
o Obstacle Avoidance:
• Manipulation Tasks:
o Pick-and-Place:
o Assembly:
o Quality Inspection:
o Types:
▪ Grayscale: Each pixel has a single intensity value (0-255 for 8-bit).
▪ Color (RGB): Each pixel has three intensity values (Red, Green, Blue
channels).
• 2. Feature-Based Representation:
o Examples:
• 3. Geometric Models:
o Example: A CAD model of a robot part, or a set of lines representing the boundaries
of a detected object.
• 4. Statistical/Textural Models:
o Description: Each pixel is assigned a class label (e.g., "road," "car," "pedestrian")
rather than just color/intensity.
• Definition: Image analysis involves extracting meaningful information, patterns, and features
from images.
• Significance:
7. Data Collection and Insights: Provides rich visual data that can be analyzed over
time to optimize processes, identify trends, and improve system performance.
8. In essence, image analysis is the "brain" of robotic vision, transforming raw visual
input into the intelligence needed for a robot to operate effectively in the real
world.
• Digitization: The process of converting an analog image (continuous light signal) into a digital
image (discrete numerical representation).
• Steps:
1. Image Acquisition (Optical Stage):
▪ Lens: Focuses light from the scene onto the image sensor.
▪ Sensor (e.g., CCD or CMOS array): Converts the light photons into analog
electrical charges proportional to the light intensity at each sensor element
(photosensor).
▪ Bit Depth: Determines the number of intensity levels. An 8-bit image has
28=256 intensity levels (0-255). A 10-bit image has 210=1024 levels. Higher
bit depth allows for finer representation of intensity variations, reducing
quantization errors.
4. Data Transfer:
▪ The digitized pixel values (often as a stream of bytes) are transferred from
the camera's internal buffer to the robot's main processing unit via
interfaces like USB, Ethernet, GigE Vision, CameraLink, etc.
• Result: A digital image, typically represented as a 2D array of numbers, ready for computer
processing.
o Characteristics:
▪ Speed: Potentially very fast for simple, fixed operations (e.g., basic filtering).
▪ Flexibility: Limited. Operations are hardwired.
o Robotics Relevance: Less common in modern robotics. May be used for very high-
speed, simple pre-processing in some specialized legacy systems or for direct sensor
conditioning, but digital processing dominates.
o Example: Old CCTV systems using analog filters for noise reduction.
o Characteristics:
▪ Complexity: Can handle highly complex and sophisticated tasks (e.g., deep
learning, 3D reconstruction).
o Robotics Relevance: Dominant method in modern robotics. Essential for virtually all
advanced robotic vision applications.
• Comparison in Robotics:
o Modern robotics overwhelmingly favors digital image processing due to its flexibility,
accuracy, programmability, and ability to handle complex tasks crucial for
autonomous and intelligent robotic behavior. Analog processing is largely historical
or for niche applications where extreme speed on a very simple task is paramount
before digitization.
• Context: These are the two fundamental processes involved in converting a continuous
analog image into a discrete digital image.
• 1. Sampling (Spatial Discretization):
o Analogy: Imagine drawing a grid over a continuous painting and picking out the color
at the center of each square.
o Process: It involves dividing the continuous image (in space) into a discrete set of
points (pixels).
o Mechanism: The camera's image sensor (e.g., CCD or CMOS array) consists of a grid
of individual light-sensitive elements (photosites). Each photosite collects light from
a tiny, specific area of the scene. This inherently samples the continuous light signal
into discrete spatial locations.
o Result: Determines the resolution of the digital image (e.g., 640x480, 1920x1080). A
higher sampling rate (more pixels) captures finer spatial detail, but generates larger
files.
o Importance: Affects the sharpness and level of detail captured. Undersampling can
lead to aliasing (jagged edges, moiré patterns).
o Analogy: Imagine measuring the exact brightness at each square you drew (from
sampling) and then assigning it to the closest step on a predefined scale (e.g., 0-255
for 8-bit).
o Process: It involves converting the continuous range of intensity (or color) values,
measured by each photosite, into a discrete set of integer values.
o Bit Depth: The number of bits used to represent each pixel's intensity value
determines the number of discrete intensity levels.
o Result: Affects the tonal range and color fidelity. Too few quantization levels can lead
to posterization or false contouring (banding effects).
o Importance: Determines the dynamic range and color accuracy of the image.
• Coding (Image Compression): The process of encoding images into a format that uses fewer
bits than their uncompressed representation, for more efficient storage and transmission.
• Influence on Storage:
o Reduced File Size: The primary benefit. Smaller files require less physical storage
space on the robot's onboard memory, server hard drives, or cloud storage.
o Increased Capacity: With compression, a robot can store many more images or
video frames for analysis, logging, or later retrieval, extending mission duration or
data collection.
o Faster Access: Smaller files can be read from storage and loaded into memory more
quickly.
• Influence on Transmission:
o Reduced Bandwidth Usage: Compressed images require less data to be sent across
communication channels (Wi-Fi, cellular, Ethernet). This frees up bandwidth for
other critical robot data (telemetry, control signals).
o Faster Transmission: Less data means faster transfer times. This is crucial for real-
time control, where low latency in image delivery to a control center is essential for
prompt human intervention or cloud-based processing.
o Lower Power Consumption: Transmitting less data consumes less power from the
robot's battery, extending operational time.
• Trade-offs:
o Lossy vs. Lossless: Lossy compression (e.g., JPEG) achieves higher compression ratios
but sacrifices some image quality, which might be acceptable for general surveillance
but not for precise measurements. Lossless compression (e.g., PNG, RLE) preserves
all data but has lower compression ratios.
23: 22. Comparison: Black & White, Grayscale, and Digital Color Images
File Size
Smallest Medium Largest
(Relative)
Processing
Lowest Medium Highest
Complexity
Use Cases in - Obstacle maps - Edge detection, shape - Object recognition (by
Robotics (occupancy grids) analysis color)
- Human-robot
- Simple object presence - Depth perception (stereo
interaction (face
detection matching)
recognition)
Simple, fast processing, Good for structural analysis, Rich information, highly
Advantages
minimal storage less storage than color discriminative
Export to Sheets
• Advantages:
1. Reduced Data Size: Significantly smaller than color images, leading to lower storage
requirements and faster transmission/processing.
• Limitations:
1. Loss of Color Information: Cannot distinguish objects solely based on their color.
Two objects of different colors but similar lightness will appear identical.
2. Ambiguity: Can lead to ambiguities in scenes where color is the primary
distinguishing feature between objects or backgrounds (e.g., sorting red and blue
widgets of the same shape).
4. Lighting Sensitivity: While less sensitive to color temperature, grayscale images are
still heavily affected by overall illumination intensity and shadows, which can alter
perceived object boundaries or textures.
• Working Principle:
o A small image patch, called the template, is slid over the larger source image pixel by
pixel (or in a sliding window fashion).
o The location(s) with the highest similarity score (or lowest dissimilarity) are
considered matches.
o Known Objects: It's highly effective when the robot needs to recognize objects
whose appearance (shape, texture, intensity pattern) is known beforehand and
relatively constant.
o Localization: It not only identifies the object but also provides its precise location
(coordinates) within the image.
o Simple and Fast: For specific, repetitive tasks where the object's appearance doesn't
vary much (e.g., rotation, scale, illumination), template matching can be very fast
and reliable.
▪ The robot has a template (a stored image) for each type of electronic
component.
▪ The robot also has templates for the solder pads on the PCB where each
component needs to be placed.
▪ After picking up a component, the robot moves over the PCB. Its camera
captures an image of the target placement area.
3. Precise Placement:
▪ By combining the orientation of the component in the gripper and the exact
location of the pads on the PCB (both derived from template matching), the
robot can precisely align and place the component, ensuring proper
electrical connection.
• Why Essential: Electronic components are often extremely small, and placement accuracy is
measured in micrometers. Template matching provides the high precision and speed needed
for high-volume automated assembly lines. Any misalignment detected by template
matching can trigger adjustments or flag a defect.
• Evaluation:
o The index is calculated at every possible position where the template could overlap
with the source image.
o The position(s) yielding the highest (for similarity measures like correlation) or
lowest (for dissimilarity measures like sum of squared differences) value of the
performance index are declared as the best match(es).
o Where:
▪ The summations are over all pixels within the template's area.
• Explanation:
o It measures the statistical similarity (correlation) between the template and a sub-
image, normalized to be invariant to linear changes in brightness (e.g., overall
illumination changes) and contrast.
o It essentially measures how well the intensity patterns of the template and the
image patch align, regardless of their absolute brightness or scaling of brightness.
o Where:
▪ Iˉu,v: Mean intensity of the image region under the template at position
(u,v).
o Value Range: The NCC score typically ranges from -1 (perfect negative correlation) to
+1 (perfect positive correlation). A value of +1 indicates a perfect match.
o Clear Peaks: It tends to produce sharp, distinct peaks at matching locations, making
it easier to identify the best match.
o Widely Used: It's a standard algorithm in computer vision libraries for object
localization.
• Robotic Example: A robot sorting screws based on their head type (e.g., Phillips vs.
Flathead).
o Principle: Compares pixel intensity patterns directly. It s the template over the
image and calculates a similarity score (like NCC) at each position.
▪ Process: The robot's vision system will take the Phillips head template and
it across the image of a screw on the conveyor belt.
▪ Outcome: The highest NCC score indicates the location of the Phillips head
screw. If the score is below a certain threshold, it's not a Phillips head (might
be a Flathead or something else).
o Advantages:
o Limitations:
▪ Features:
▪ Search Image: It does the same for the image of the screw on the
conveyor.
o Advantages:
▪ Faster for complex scenes: Can be faster than pixel-wise correlation for large
images or if multiple objects are present.
o Limitations:
• Given:
▪ Template: Take Template A and place its top-left corner at position (u,v) in
Image (I).
▪ Calculate NCC: Compute the NCC(u,v) score between Template A and the
overlapping region of Image (I).
▪ Repeat: Move Template A one pixel to the right (increment u) and repeat the
process. When reaching the end of a row, move to the next row (increment
v) and reset u. Continue until Template A has been compared at every
possible position in Image (I).
▪ Find Max: After completing all positions, find the maximum value in
Score_Map_A. This maximum value (e.g., 0.95) indicates the best match for
Object A. The coordinates (u,v) corresponding to this maximum are the
detected location of Object A.
▪ Repeat Step 2: Perform the exact same sliding window process, but this time
using Template B and storing the NCC scores in Score_Map_B.
▪ Find Max: Find the maximum value in Score_Map_B (e.g., 0.92) and its
corresponding coordinates, indicating the best match for Object B.
o Decision:
▪ The robot compares the maximum NCC scores from Score_Map_A and
Score_Map_B.
▪ If the max score for Template A (0.95) is higher than a predefined threshold
and also higher than the max score for Template B (0.92), the robot
concludes that Object A is present at its detected location.
• Visual Representation:
o Resulting Score Map A: A heat map showing high values (red/yellow) where
Template A matches well.
o Resulting Score Map B: A heat map showing high values where Template B matches
well.
o A cursor/bounding box highlighting the location of the highest match on the original
image.
• Polyhedral Objects: Objects with flat faces and straight edges (e.g., cubes, pyramids,
machined parts, furniture).
• Context: For polyhedral objects, edges are often sharp and well-defined, forming straight
lines or corners. This makes them relatively easier to detect compared to curved or textured
objects.
• Types of Edges:
1. Object Boundaries: Edges formed where the object meets the background.
2. Internal Edges: Edges formed by the intersection of two faces on the object itself
(e.g., the ridge of a cube).
3. Shadow Edges: Edges formed by shadows cast by the object, which can sometimes
be confused with actual object edges.
• Importance:
1. Shape and Pose Estimation: Edges are crucial for understanding the 2D shape and,
with multiple views or depth information, the 3D pose (position and orientation) of
the polyhedral object.
2. Feature Extraction: Edges can be broken down into line segments, which are
excellent features for recognition and localization.
• Corner Point (CP) Detection: Corners are points where two or more edges intersect. For
polyhedral objects, these are typically sharp vertices.
1. Compute Image Gradients: Calculate the intensity gradients (Ix and Iy) in both
horizontal and vertical directions for every pixel in the image. This indicates the rate
of change of intensity.
2. Construct Structure Tensor (or Harris Matrix/Covariance Matrix): For each pixel
(x,y), a 2×2 matrix is formed by summing the products of gradients in a local window
around that pixel: M=u,v∈W∑[Ix2IxIyIxIyIy2]
3. Calculate Corner Response Function (CRF): The Harris detector uses a response
function R based on the eigenvalues (λ1, λ2) of the matrix M: R=det(M)−k(trace(M))2
4. Interpretation of R:
5. Thresholding: Apply a threshold to the R values. Pixels with R above this threshold
are considered potential corners.
• Example (Polyhedral Context): When a robot identifies a box, the Harris detector would
pinpoint the eight vertices of the box. These corner points, being stable and distinctive, are
excellent features for recognizing the box and determining its 3D pose in the environment.
• Threshold Selection: After computing gradient magnitudes (or applying edge detection
operators like Canny), a threshold value is applied to decide which pixels are strong enough
to be considered edges and which are not. Pixels with gradient magnitudes above the
threshold are edges; others are suppressed.
1. Information Loss:
▪ Too Low Threshold: Detects too much noise and insignificant texture
changes as edges. Results in cluttered edge maps, false edges, and difficulty
in identifying meaningful structures.
2. Algorithm Performance: Directly impacts the performance of subsequent image
processing steps (e.g., line fitting, object recognition, feature matching). Poor edges
lead to poor results down the line.
1. Manual/Empirical Thresholding:
2. Otsu's Method:
▪ Process:
1. Apply high threshold: Only pixels above this are confirmed strong
edges.
4. Adaptive Thresholding:
▪ Description: Instead of a single global threshold, the image is divided into
smaller regions, and a separate threshold is calculated for each region based
on its local pixel characteristics (e.g., local mean, local Gaussian-weighted
sum).
▪ Pros: Highly effective for images with uneven illumination or varying contrast
across the image.
• Scenario: A robot needs to detect and pick up specific rectangular boxes on a conveyor belt.
• Illustrative Example:
▪ Result: Robot accurately identifies the box, its dimensions, and its precise
location and orientation, leading to successful gripping.
▪ Effect: Many edges are missed, especially weaker ones or those in subtly
shaded areas. The detected edges become fragmented, with significant gaps.
▪ Partial Detection: Only parts of the box might be detected (e.g., one
side), making it impossible to recognize it as a complete rectangular
object.
▪ Effect: Too many edges are detected, including noise, texture variations, and
subtle illumination changes. The edge map becomes very cluttered.
▪ Merged Objects: If multiple boxes are close, the low threshold might
connect them with spurious edges, making the robot perceive them
as one large, amorphous object instead of individual boxes.
• Definition: Corner points (CPs) are highly distinctive points in an image where there is a
significant change in image intensity in at least two independent directions. They are often
characterized by high curvature or the intersection of two or more edges.
• Principle:
o Local Intensity Variation: The core idea is to identify pixels where moving a small
window (neighborhood) in any direction results in a substantial change in pixel
intensity.
o Quantifying "Change":
1. Flat Region: If you move a small window across a flat, uniform intensity
region, the pixel intensities within the window won't change much.
2. Edge: If you move the window along an edge, the intensities will remain
largely the same. However, if you move the window perpendicular to the
edge, there will be a sharp change in intensity.
3. Corner: If you move the window in any direction away from a corner, you
will observe a significant change in pixel intensities.
o Robustness: Corners are generally more stable and robust features than simple edge
points. They are less affected by minor noise, slight rotation, or scaling, making them
excellent landmarks for robotic tasks like localization, mapping, and object
recognition.
▪ Ix2(x,y)=Ix(x,y)×Ix(x,y)
▪ Iy2(x,y)=Iy(x,y)×Iy(x,y)
▪ Ixy(x,y)=Ix(x,y)×Iy(x,y)
▪ For each pixel (x,y), define a small local window (e.g., 3x3 or 5x5).
▪ Sx2=∑windowIx2
▪ Sy2=∑windowIy2
▪ Sxy=∑windowIxy
▪ For each pixel, calculate the corner response R using the smoothed gradient
products: R=(Sx2Sy2−Sxy2)−k(Sx2+Sy2)2
6. Thresholding:
7. Non-Maximum Suppression:
▪ For every pixel that is a potential corner, compare its R value with all its
neighbors in a small window (e.g., 3x3).
▪ If it's not the local maximum within that window, suppress it (set its value to
0).
▪ This ensures that only the strongest, most distinct corner point in a local
area is detected.
• Explanation: The algorithm identifies corners by looking for regions where the image
intensity changes significantly in all directions within a small neighborhood. The Harris
response function mathematically captures this multi-directional change. Thresholding
removes weak responses, and non-maximum suppression refines the detected points to
ensure only one corner is reported per actual corner feature.
• Characteristics:
o It accounts for the perspective effect, where objects appear smaller the farther they
are from the camera.
▪ Use Case: For mobile robots navigating on a flat floor, it's often useful to
transform the camera's perspective view into a top-down, "bird's-eye" view
of the ground plane.
▪ Process: By identifying at least four non-collinear points on the ground plane
in the image and knowing their corresponding real-world coordinates (or
their ideal positions in a rectified top-down view), a perspective
transformation matrix can be computed. This matrix can then be applied to
the entire image.
▪ Process: If a robot knows the 3D model of an object and can identify at least
4 corresponding 2D points in the image, perspective transformation
(specifically Perspective-n-Point, PnP algorithms) can be used to calculate
the object's 6D pose.
o Augmented Reality/Projection:
• Principle: While a perspective transformation maps 3D to 2D, IPT attempts to recover the 3D
information from a 2D image, specifically for a known planar surface.
4. Resulting Bird's-Eye View: The "warped" and distorted lines in the original
camera image are transformed into straight, parallel lines in a top-down
"bird's-eye view" of the ground.
• Definition: Camera calibration is the process of determining the intrinsic and extrinsic
parameters of a camera. It essentially models the camera's optical properties and its
position/orientation in the 3D world.
• Parameters Determined:
1. Intrinsic Parameters:
▪ Focal Lengths (fx, fy): Represent the camera's effective focal length in pixels
along x and y axes.
▪ Principal Point (cx, cy): The pixel coordinates of the image sensor's optical
center.
▪ Lens Distortion Coefficients (k1, k2, p1, p2, k3...): Describe how the lens
distorts the image (radial and tangential distortions).
2. Extrinsic Parameters:
▪ Rotation Matrix (R): Describes the camera's orientation (pitch, yaw, roll)
relative to a world coordinate system (e.g., robot base).
▪ Translation Vector (T): Describes the camera's position (x, y, z) relative to the
world coordinate system.
5. Multi-Camera Systems: Necessary to relate the views from multiple cameras (e.g.,
stereo vision for depth) into a common coordinate system.
1. Preparation:
▪ Robot System: Robot arm with the camera mounted (either on the end-
effector "eye-in-hand" or fixed near the robot "eye-on-base").
▪ Steps:
▪ The software then uses these detected points and the known 3D
layout of the pattern to solve a complex optimization problem,
calculating the intrinsic parameters that best explain how the 3D
points project to the 2D image points, minimizing reprojection error.
▪ Steps:
4. Validation:
▪ After calibration, move the robot to a new pose and capture an image of the
pattern.
▪ Project known 3D points of the pattern onto the image using the derived
calibration parameters.
• Impact: Illumination is one of the most critical factors influencing the quality and usability of
images for robotic vision.
2. Color Accuracy:
3. Shadows:
4. Reflections/Glares:
5. Noise:
▪ Problem: Insufficient light can necessitate higher camera gain settings, which
amplify noise in the image.
o Description: Light is introduced through the camera lens (or very close to it) and
spread evenly over the object using a diffuser. The light travels parallel to the
camera's optical axis.
▪ Minimizes Shadows: Because the light source is aligned with the camera,
shadows are cast directly behind features from the camera's perspective,
making them less visible and reducing their interference with object
detection.
▪ Even Illumination: Provides very uniform lighting across the entire field of
view, preventing hot spots or dark regions that can affect contrast.
▪ Best For: Shiny, reflective, or uneven surfaces (e.g., curved metal parts,
circuit boards, reflective plastics) where typical direct lighting would cause
glares or harsh shadows. Excellent for presence/absence detection and
feature extraction.
o Example: Inspecting solder joints on a PCB, where glare from direct light would
obscure the joint quality.
o Example: A robot arm needs to pick an unorganized part from a bin (bin picking).
Structured light helps create a 3D map of the bin and the parts, allowing the robot to
determine which part to pick and how to grip it, avoiding collisions.
• Setup:
1. Robot Arm: A 6-axis industrial robot arm for precise movement and positioning.
▪ Ring Light with Diffuser: Coaxial LED ring light with a diffuser integrated
around the camera lens to provide uniform, shadow-free illumination inside
the deep bolt holes.
3. Software: Custom vision software with image processing algorithms (e.g., edge
detection, texture analysis, template matching) and a decision-making logic.
2. Image Acquisition: For each hole, the camera captures a high-resolution image with
the uniform illumination from the ring light.
3. Image Analysis:
▪ Pass/Fail: If all criteria are met, the hole passes. If any defect is detected, the
hole (and potentially the entire engine block) is marked as a fail.
• Outcomes:
3. Reduced Costs: Lower labor costs and reduced material waste from catching defects
earlier.
4. Enhanced Product Quality: Ensures only high-quality engine blocks proceed, leading
to more reliable final products and reduced warranty claims.
• Core Role: Vision systems act as the "eyes" of automated quality control, performing rapid,
objective, and consistent inspections that are often impossible or impractical for humans.
• Key Contributions:
1. Defect Detection:
2. Dimensional Verification:
▪ Precisely measuring dimensions, tolerances, and geometric features (e.g.,
hole diameter, bolt length, product shape) to ensure they meet
specifications.
▪ Crucial for parts with tight tolerances (e.g., aerospace components, medical
devices).
3. Presence/Absence Checks:
▪ Reading and verifying text, batch codes, serial numbers, expiry dates, or
barcodes on products or packaging.
▪ Data collected can be used for statistical process control and predictive
maintenance.
5. Cost Reduction: Reduces labor costs, material waste, and warranty claims.