InfoPLC Net Furciniti Anna
InfoPLC Net Furciniti Anna
INGEGNERIA
DELL’INFORMAZIONE
Supervisor:
Prof. Giovanni Boschetti
Candidate:
Anna Furciniti
2057452
3
Italian Abstract
L'integrazione della visione artificiale nell'Industria 4.0 segna una rivoluzione importante.
Molte aziende industriali hanno adottato sistemi di visione artificiale per l’identificazione di
prodotti difettosi, per la verifica nell'assemblaggio di pezzi, guidare robot e leggere OCR;
compiti ardui che in precedenza erano laboriosi, inclini agli errori e occupavano molto tempo
poichè eseguiti dall'uomo. Questi sistemi di visione artificiale avanzati, equipaggiati con
telecamere e intelligenza artificiale, ora catturano e analizzano immagini di oggetti con
notevole precisione, gestendo diverse funzioni e migliorando significativamente l'accuratezza.
Inoltre, i sistemi di visione svolgono un ruolo cruciale nella robotica, permettendo alle
macchine di operare in ambienti non strutturati.
Questa tesi approfondisce la libreria TwinCAT Vision per il rilevamento di oggetti
geometrici. TwinCAT Vision, funzione presente in TwinCAT 3, facilita i compiti di
elaborazione di immagini industriali come il rilevamento, l'identificazione o la misurazione di
oggetti direttamente all'interno del PLC in tempo reale. Supporta l'uso di immagini già salvate
sul PC così come l’impiego di una telecamera GigE per operazioni in tempo reale.
Utilizzando la telecamera MakoG192B, TwinCAT Vision identifica varie forme in
movimento e localizza accuratamente le coordinate dei loro centri. Queste capacità
migliorano le interazioni dei robot con le linee di produzione, che sono spesso controllate dai
PLC. Questo studio evidenzia diversi algoritmi di visione per il rilevamento del centro delle
forme e dimostra la loro efficacia attraverso rigorosi test sperimentali. Questi algoritmi
permettono una localizzazione precisa dei centri degli oggetti, cruciale per compiti che
richiedono alta precisione. Le coordinate determinate dal sistema vengono utilizzate da un
robot SCARA per una manipolazione precisa, dimostrando così l'efficacia della strategia in
applicazioni reali. Integrando l'elaborazione delle immagini nella piattaforma TwinCAT, il
sistema raggiunge applicazioni di controllo altamente sincronizzate e tempi di risposta
estremamente brevi, mostrando significativi progressi sia nell'efficienza che nella affidabilità
dei processi industriali automatizzati.
4
5
Index
CONCLUSION..................................................................................................................... 90
IMAGES ............................................................................................................................... 92
SITOGRAPHY ...................................................................................................................... 96
7
8
Chapter 1: Introduction
In this chapter are highlighted the objectives of this thesis with a short explanation of general
concepts useful to understand the topics present in the subsequent chapters.
1.2 Objective
The main objective of this project begin with the basic concepts of artificial intelligence and
ends with the deep exploration of computer vision techniques.
Different algorithms and methods, such as blurring filters, edges and contours recognition,
and machine learning, are used to define the best solution for geometric object detection.
After the recognition of the object, its coordinates are converted in World Coordinate with the
respect to the camera system for the last stage, where the robot performs a pick and place,
synchronized with the conveyor belt. Nowadays this process is one of the most widely used in
the industrial automation and is what we define as Industrial 4.0 Revolution.
9
As advancements continued, Industry 3.0 was the introduced, characterized by the automation
of production processes through the use of electric devices, programmable logic
controllers(PLCs) and robotics. This shift to automation led to more efficient production
processes compared to the previous manual method employed.
Today, we refer to the ongoing transformation as the Fourth Industrial Revolution, or Industry
4.0, which introduced the concept of the ‘Smart Factory’. In these factories, autonomous
systems interact seamlessly through the Internet of Things (IoT) and cloud technologies,
allowing for real-time communication and collaboration between machines and humans.
Modern technologies are radically revolutionizing the manufacturing by integrating robotics,
big data, artificial intelligence, computer vision and various sensors. This synergy has
significantly improved human work in terms of efficiency, cost-effectiveness and time
management.
A key component of Industry 4.0 is computer vision, which play a crucial role across many
field. Many industrial companies now use machine vision systems to identify defective
products, an arduous task previously done by humans that was prone to errors and time-
consuming. Now these machine vision systems, using a camera and the artificial intelligence,
can now capture and analyze image of objects, identifying defects with remarkable precision.
Moreover, machine vision system facilitates automated inspection through sophisticated
image processing techniques, providing real-time data during production that can help early
detection of potential issues.
In Industry 4.0 machines are interconnected within an ecosystem named IoT (Internet of
Things), then the computer vision increase the power and usability of sensors within this IoT
network, providing real time data capture and analysis.
This capability is crucial for:
• Automated quality inspection: checking product quality by detecting defects such as
irregularities in a biscuit’s shape or color inconsistencies.
• Robotic applications: enabling precise ‘pick and place’ operations that require visual
identification of object positions and orientations.
• Traceability: using QR codes for tracking products.
• Safety enhancements: improving workplace safety by monitoring environments for
unauthorized access.
As we advance further into the age of smart manufacturing, the role of computer vision
continues to expand, driving innovations that transform how industries operate. The
combination of AI and computer vision with Industry 4.0 is setting new benchmarks in how
quality and safety are managed in production environments.
10
1.4 Artificial Intelligence
Artificial intelligence (AI) is the ability of a digital computer or computer-controlled robot to
perform tasks commonly associated with intelligent beings[3]. Then the goal is to generate an
intelligent agent that select the most advantageous action in a given scenario.
But, what does “intelligent” means? Intelligence include an agent’s ability to adapt its
decision to new circumstances. Different research in AI focuses on different components of
the intelligence including learning, reasoning, problem solving, perception, and using
language.
The earliest work in AI is conducted by Alan Turing, involving an experiment with three
participants: a computer, a human interrogator, and a human foil. The goal for the interrogator
is to determine, through questioning, which participant is the computer. This experiment laid
the foundational concept for what is now known as the Turing Test, a standard for evaluating
machine’s ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that
of a human.
11
Figure 1: Convolutional Neural Network
Each layer is interconnected, allowing the information to flow from the input to the output.
For instance (Figure 1), in computer vision, an image fed into the network undergoes multiple
transformations through the hidden layers before the network output a label identifying the
image. The NN used for the vision application is called Convolutional Neural
Network(CNN).
Deep learning has revolutionized the field of computer vision, enabling machines to identify
and classify objects within images with high accuracy. It has practical applications in our real
life such as automated quality control in manufacturing, facial recognition for security, and
navigation for autonomous vehicles.
12
• Acquiring input data, which are the images;
• Pre-processing the images;
• Interpreting the images to classify them.
Training is a crucial phase in computer vision, requiring substantial datasets. During this
phase, neural networks are trained with extensive image sets. After processing through
various hidden layers, the neural networks become adept at predicting accurate image
classifications or continuous values with regression algorithm.
1.5.2 CV algorithms
The process of image interpretation can be performed using different filters to enhance image
quality:
• Noise reduction: filters such as smoothing, mean or gaussian filters reduce the rumors
present in an image.
These filters use a kernel, a matrix of pixels, whose size depends on the type of noise
or on the specific task. The pixels within this kernel are used for calculations like
mean, where kernel slides across the image pixel by pixel. Then the result is a central
13
pixel that replaces the area of previous kernel in the new output image, making the
image clearer.
• Image segmentation: divides an image into regions (ROI-Region Of Interest).
Consequently the focus is only on relevant segment optimizing analysis and
improving resulting accuracy.
• Object detection: allows for the localization and classification of objects by using edge
detection, line detection or hough transformations.
• Pattern detection: identifies recurring patterns within images.
• Feature detection: searches for relevant features useful for the classification.
14
Figure 3: Creation of digitized image
An image is represented using Cartesian coordinates, where pixel value are organized in a
grid.
The Pinhole camera model plays a crucial role in understanding the basics of image formation
and in connecting photography with new technologies in computer vision.
It describes the mathematical relationship between the coordinates of a point in three-
dimensional space and its projection onto the image plane of an ideal pinhole camera[4]. The
key elements of this model (Figure 4) are:
• Optical center: the geometric point through which every ray from the object to the
sensor passes.
• Image plane: the plane where the image rays converge to form the visual image.
• Optical axis: the straight line that passes through both the center of the image plane
and the optical center.
• Principal point: the intersection between the image plane and the optical axis.
• Focal length: the distance between the optical center and the image plane.
15
Figure 4: Pinhole model
The advantage of this model lies in how a simple aperture can form an image.
1.6.1 Camera
In practical applications with a real camera system, several factors must be considered to
improve effectiveness:
• Field of view(FoV): the angle perceived by the camera determines how much the
camera is able to see of the object.
A larger FoV means the camera can capture more of the scene.
16
When the FoV decrease, the focal length increases.
The general formula for the FoV calculation is (Sensor Dimension/ Focal Length) x
57,3. The Sensor Dimension is the size of the image sensor in millimiters, while 57,3
is the conversion factor from radians to degrees.
• Focal length: is the distance in millimeters from the optical center of the lens to the
camera’s sensor, which determines the composition of the captured image.
• Focus: adjusting the lens to clearly define objects.
• Aperture: controls the amount of light entering through the lens; a wider aperture
allows more light in and decreasing the depth of field.
• Exposure time: the duration of the camera’s sensor is exposed to light, measured in
fractions of a second. A slow time is used for low light environments, while a fast
exposure is ideal for freezing motion.
• Depth of field: the range within which objects appear acceptably sharp in an image.
Some lens can cause a distortion, which is a deviation from the ideal image reproduction:
• Radial distortion: depends on the distance of the distorted point from the image center.
It combines Pincushion and Barrel distortion.
• Tangential distortion: caused by the misalignment between lens and sensor.
The best way to remove distortion is through calibration, a process that involves estimating
intrinsic and extrinsic parameters. To do this, a calibration pattern with a known shape, such
as the square corners of a checkerboard, is necessary.
After the image capture, the next step involve the processing, where noise is removed and the
image is improved by using some filter.
17
• 2D camera: the image is in two dimensions, X and Y, no height Z defined; it is pratical
for the quality check, for characters reading and OCR. However, light changes can
negatively impact the results, such as reducing precision in the contours of the objects.
• 3D camera: the image is tridimensional with X, Y and Z axes; it is advantageous for
specific tasks like the measurements of shapes, height, angles and volumes; such as, in
bin picking applications, the robot needs a precise representation in a tridimensional
space.
The camera used for this project is a 2D camera.
These fundamental concepts support every visual application and are crucial for the
discussion in this thesis, providing the foundational understanding necessary to explore the
advanced visual technologies driving Industry 4.0.
18
1.7.1 PLC
A Programmable Logic Controller (PLC) is a type of specialized computer that is used for
controlling industrial processes and machinery.
PLCs are widely employed in the automation, such as for controlling machinery on factory
assembly lines or food processing operations. They gather input from sensors or user
interactions, process these inputs based on a pre-defined sequence, and generate outputs to
actuators or other devices to manage machinery or processes. The programming logic used in
a PLC can be implemented through various languages, including ladder logic, structured text,
or block diagrams.
Robots are usually controlled though their own proprietary controllers. These controllers are
then, in turn, connected to the PLC, which generally acts as the main controller that oversees
different robotic components and their interactions within the overall process. For instance, in
a manufacturing environment, the PLC may manage the motion of a robotic arm, the
functioning of conveyor belts, the camera and the operation of sensors and switches. This
arrangement ensures that operations are coordinated and efficient, where timing and sequence
are essential.
19
Chapter 2: TwinCAT 3
The software used for this thesis is TwinCAT 3, developed by Beckhoff, which stands for The
Windows Control and Automation Technology. It is a PC-based control system for industrial
automation transforming any PC into a real-time system.
The first version of TwinCAT was introduced in 1996, leveraging the robust capabilities of
Microsoft Visual Studio 2010, 2012, and 2013.
The integration of TwinCAT 3 with Microsoft Visual Studio offers several benefits, including
increased productivity and minimized setup time through advanced tools for profiling and
debugging. Additionally, it is possible to extend the environment with plugins and supports
multiple programming languages.
Another benefit is its modularity, which enables functional changes and additions at any time.
The general structure of the environment is divided in two parts, as shown in the Figure 7:
• Environment in Visual Studio: based on XAE (Extended Automation Engineering)
and includes the programming section used for the coding and for debugging.
C and C++ can be used for the real-time programming.
XAE allows hardware to be programmed and configured within a single engineering
tool.
20
The vision resources are managed by XAE, considering camera configurations,
calibration, simulation, and file source control. The choice of the camera and the
loading of the file source are both considered in this section of the environment.
Once the code is completed, it is compiled and sent to the runtime part.
• Runtime based on XAR (Extended Automation Runtime): In this section, modules can
be loaded, run and managed.
They can be programmed independently by different developers.
Its output is the Fieldbus, which by default is EtherCAT. The fieldbus enables the
communication with sensors and activate actuators.
Its main task are as follows:
o Executing modules in real time.
o Supporting the multi-core CPU.
o Supporting a 64-bit operating system.
The two parts communicate using a protocol called ADS (Automation Device Specification),
which can be used to upload new software or, if necessary, to read or write new variables.
2.1.1 Modules
TwinCAT is partitioned into modules, each with a distinct purpose, enabling functionality
related to different aspects of industrial automation.
Then TwinCAT project is divided into the following modules as shown in Figure 8:
• System: handles the basic setup of the entire automation system, including core
configuration and router memory.
In the core configuration, TwinCAT enables a distinction between Windows cores and
isolated cores. The operating system and TwinCAT share the processor time, with the
real-time portion allocated between 10% and 90%. In contrast, isolated cores are fully
dedicated to TwinCAT application.
21
For vision applications it is recommended the use of isolated cores, such as in the
Figure 9.
22
testing. When modification or troubleshooting are required, this modularity provides highly
advantages.
23
Creating multiple PLCs involves dividing the program into several modules and running them
on a target device. The benefits produced by this procedure can be summarized as follows:
• Uniform structure of the program.
• Understandable name of variables and instances.
• Intuitive readability of the code, even for individuals who are not involved in its
development.
• Simplified maintenance.
• High code quality.
In TwinCAT 3, the project in which the PLC is programmed contains several elements crucial
for operation of the controller program:
• Interfaces
• Functions: these encapsulate the image processing algorithms. A function is structured
as follows:
hr := F_VN_ProcessImage(<….>, hr);
where hr is the status variable applicable to each function, F_VN is the prefix
identifying the functions, and <….> represents the parameters useful for the function,
such as input or output.
• Function Blocks: useful for complex processes, such as the communication with GigE
Vision Camera.
Figure 11: Create new project in Visual Studio with integrated TwinCAT 3
24
The PLC will then be automatically created with the MAIN program. It is also possible to
define libraries are used in this project by modifying the References. For this thesis I will add
the Tc3_Vision library.
As in any in code, initialize variables in the top section using VAR and END_VAR, and
create the main program for control in the bottom section with PROGRAM. After defining
the program to be executed, proceed to the task configuration present the system module of
the TwinCAT program.
Finally, the PLC module should be compiled using Build command. After selecting the target
system, follow these steps to run the program:
• Click on Start ; the program is now running. You can monitor the variables of the
various program blocks in real time; it is possible to see the actual values of a variable
directly in the block editor.
25
Tc3_Vision library, Figure 13, is the one necessary to run this project.
TwinCAT Vision is directly integrated into the TwinCAT Engineering environment for image
processing solution. It enhances image processing capabilities with a universal control
platform that incorporates PLC, motion control, robotics, IoT and HMI.
One of the advantages is the synchronization of all control functions related to image
processing in real time, effectively removing latency. This addition significantly increases the
importance of image processing in applications such as Industry 4.0.
TwinCAT Vision can be applied in different fields:
• Measurement: distances, diameters and roundness.
• Detection: pattern recognition, position detection and color recognition.
• Identification: Data Matrix code, Bar code and QR code.
• Monitoring: machine oversight, simplified service and simplified maintenance
In this library, there are two different ways to use the images in the TwinCAT program
(Figure 14):
26
Figure 14: Vision device
• A camera can be connected and configured under a Vision node, calibrated and started
for capturing and recording images. The creation of a GigE Vision Camera object
begins with a new node and a connection to the camera via its IP address. After this
the camera starts with the acquisition process.
TwinCAT supports both line-scan cameras. which record single line of pixels at time,
and area-scan cameras, which are based on rectangular sensors with a GigE Vision
interface.
• An individual image, saved on the PC, can be loaded and modified by implementing
image processing procedures. The creation of a File Source object start by adding a
new node in the vision module and selecting the device type ‘File Source’ such as in
Figure 14.
27
In File Source Control tab, Figure 15, you have the option to ‘Read From Target’ and
load an image by clicking ‘Add Files’.
Depending on the program’s needs, you can change the image format from ‘Original
Format from File’ to ‘24-bit RGB’ or ‘8-bit Monochrome’.
The respective object ID will be connected to the corresponding PLC task.
You can easily switch from the live camera view to recorded images by only clicking in the
specific area of the vision module.
The sequence of image processing occurs directly within the PLC program, with the analysis
chain executed in TwinCAT runtime system, enabling communication with other processes
running on PLC.
2.3.2 Functions
In TwinCAT Vision, functions are categorized based on their specific tasks. The following is
an example of this separation:
• Images: group all operations that can be performed on images. It allows access to pixel
information, width, and height, as well as the creation and display of images.
Additionally, this category includes functionality for image analysis, segmentation and
filtering.
• Container: useful for storing contours, enabling the addition and removal of elements.
It also allows access to a single contour at specific position using indices.
• Contours: collections of 2D points crucial for object detection. This group includes
checks on shape, point inclusion and geometrical features such as the center, area and
perimeter.
• Code reading: facilitate reading and detection of OCR.
28
• Drawing: includes functions for visualizing detected features such as contours, points,
lines and shapes.
• Advanced Functions: extend the basic operations.
• Miscellaneous
All TwinCAT Vision functions return an HRESULT. Then after the execution of a function,
the resulting alphanumerical code can be examined to determine if the execution was
successful or not.
29
Chapter 3: TwinCAT Code Implementation
In this chapter, we provide a detailed explanation of how to use the camera and how all the
modules are interconnected to identify the centers of circles and rectangles.
• GigE Vision Camera: choose this option as indicated in the previous section (Figure
14).
• Connect the camera: enter the camera’s IP address to establish the connection.
30
Figure 17: Computer linked to adapter, adaapter linked to camera
The computer links to an adapter via an EtherCAT connection, which in turn connects to the
camera (Figure 17).
EtherCAT differs from standard Ethernet by offering in real time communication without
latency, making it preferable for industrial applications.
The setup starts with the creation of a Vision node for the camera.
To access the network connection, first open the control panel of the computer. This action
displays the state of Ethernet connection for the camera.
31
By selecting ‘Property’, in Figure 18, it is possible to set the IP address and the subnet mask
to match the camera’s address such as in Figure 19.
After configuring these settings, return to the TwinCAT system and open the camera
initialization assistant to verify the device’s correct connection; as displayed in the Figure 20,
the IP of the Ethernet and the camer’sa IP Address are the identical. In this case, the model
identified is Mako G-192B.
In some instances, the connection with the camera may be lost. If this occurs, it is necessary a
new ‘Discover Device’.
Automatically, a device that corresponds to the Ethernet Adapter used will be added in the I/O
module (Figure 21). In this module, it is possible to check TcIoIpSettings by selecting
32
Parameter tab, where .IpAddress and .SubnetMask are defined. Then, we can enable the
Manual Setting by changing the value in TRUE.
In I/O by opening ‘Dispositivo 1’ one can view the information of Ethernet 3, including IP
and MAC address.
In this thesis, only one camera is considered for the detection, but in some cases it is useful to
consider more cameras by adjusting in the parameters tab IpMaxReceivers and
UdpMaxReceivers.
33
located on the surface of a semiconductor chip. This technology helps to achieve high image
quality.
The resolution of the generated images is 800 x 600 pixels. Below is Table 1, resuming the
main characteristics of the camera used.
Specification Details
Chroma Monochrome
Gain Control 0 dB - 24 dB
34
Dimensions 60.5 mm × 29.2 mm × 29.2 mm (including connectors)
Protection IP30
The camera is equipped with two adjustment wheels that facilitate two types of settings:
• Aperture: regulate the amount of light entering the camera, larger aperture allows
more light to enter, resulting in a brighter image.
• Focus: make the image clearer and more detailed by sharpening the focus of the
image elements.
After making changes, click on the green checkmark to save everything, as shown in the
Figure 23.
The absence of orange indicates that all the changes have been successfully saved.
35
Initially, the ‘Trigger Selector’ was set to ‘Continuous’, meaning the camera was in
continuous acquisition mode. However, this setup is not feasible because the huge amount of
data sent from the camera, slowed down the entire process.
Therefore, ‘Trigger Selector’ was changed to ‘AcquisitionStart’, see Figure 24. In practice,
when an object passes through the photocell, the signal sends an activation to the camera. The
camera proceeds through the states until reahes the TCVN_CS_START_ACQUISITION state
and captures a single image, as indicated by the ‘AcquisitionMode’ beign set to
‘SingleFrame’. Alternatively, depending on the task, it is possible to select ‘MultiFrame’ and
specify the number of images in ‘AcquisitionFrameCount’.
36
The image used for calibration is provided directly in the TwinCAT Vision guide[22] along
with the instructions for subsequent settings, as depicted in the Figure 25. It is loaded by
clicking on ‘Load Images’.
Based on the image loaded, the following parameters are defined in the right hand section:
• Width: number of circles in a row.
• Height: number of circles in a column.
• X: distance between centers of two circles in horizontal direction.
• Y: distance between centers of two circles in vertical direction.
• Color inverted: by default, the system assumes dark circles on a light background. In
this case, the colors are inverted.
Figure 25: Camera Calibration, load image and calibrate intrinsic and extrinsic
In this specific context, the ‘World Coordinates’ coincides with the camera system
coordinates (Figure 116). The camera system, as viewed from the desktop position is oriented
such that the front part of the camera is rotated 90 degrees with respect to the robot system.
37
Given the camera’s position, its X and Y axes align with those of the photocell, which
facilitates future translations.
Once all the parameters are configured, the ‘Calibrate Intrinsic’ and ‘Calibrate Extrinsic’
buttons can be clicked to calculate:
• Intrinsic parameters of the camera, which depends on how it captures the images.
These generally includes the Focal Length, Aperture, Field of View (FoV) and
resolution. They are important because allow the system to interpret what the camera
sees correctly in terms of real dimensions. This results in the camera matrix and
distortion coefficients being calculated.
• Extrinsic parameters of the camera, which depends on its location and its orientation,
and are calculated using image information alone. These parameters allow the system
to localize and interact with objects in the three-dimensional space. This results in the
rotation matrix and translation vector being calculated.
The function ‘Write Results’ can be used to save calibration data such as Camera Matrix,
Distortion Coefficients, Rotation Matrix and Translation Vector. These values are useful for
transforming coordinates from pixels to world coordinate in the subsequent sections.
38
3.1.5 PLC connection
The camera and PLC are connected through ‘Symbol initialization’ in the PLC_Vision
Istance, see Figure 28, by selecting the specific camera used for the specific task via
MAIN.fbCamera.oidITcVnImageProvider, as detailed in the Figure 27.
It works in the same way for File Source objects.
39
• TCVN_CS_START_ACQUISITION
• TCVN_CS_STOP_ACQUISITION
• TCVN_CS_TRIGGERING
40
The developed code, Figure 30/31, adheres to this specified diagram for image acquisition.
Here’s a structured overview of the process.
The process starts checking the camera’s state using fbCamerControl.GetState(). The state
returned represents the camera’s current state as represented in the operational diagram when
the program is in running.
Using SimpleCameraControl is done the calibration data retrieval, previously calculated,
using Get method:
• GetCameraMatrix
• GetDistortionCoefficients
• GetRotationMatrix
• GetTranslationVector
This data retrieval is carried out for the future transformation
(F_VN_TransformCoordinatesImageToWorld_Point) of the center coordinates from pixel
points into ‘World Coordinate’ using intrinsic and extrinsic parameters as input. The
calibration and the ‘Write Results’ are performed only once, unless the camera is moved to
another position; however, the data are retrieved each time the code runs, because without this
retrieval, it would not be possible to use them in the code.
41
Figure 31: ACQUIRING, STOP_ACQUISITION
42
3.2 Circles and Rectangles centers recognition
In computer vision, object recognition can be carried out using multiple filters. This chapter
will explain all the filters applied to images to achieve the final objective: determining the
coordinates of the centers of circles and rectangles, which are then passed to the robot for pick
and place operations.
43
Because the objects considered are white or some other light color, we are primarily interested
in the brightest pixels; therefore, the maximum value for the threshold will be 255 (white). By
applying the threshold using F_VN_Threshold(ipImageIn, ipImageWork, 170, 255,
eThresholdType, hr), only the brightness objects will be considered, as the range between 170
and 255.
The resulting image will be filtered in different ways, as demostrated in the code displayed in
Figure, and only the best filters will be considered in the final version of the code. Generally
these functions share common parameters: ipSrcImage, the image captured from camera, and
ipDestImage, the filtered image produced as output.
Below is a summary of all TwinCAT functions employed:
• F_VN_GaussianFilter: applies Gaussian filter to smooth the image. Kernel size, which
can have different old sizes such as 1,3,5,7, is specified by nFilterWidth and
nFilterHeight. This filter removes smaller details preserving the larger ones.
44
Figure 34: GaussianFilter
• F_VN_MedianFilter: applies a median filter to the image. nFilterSize specifies the size
of the matrix considered. By taking into account the values present in the matrix, this
filters calculates the median value. For example, consider the sequence 7 8 9 10 10 11
13 13 155, the median value would be 10.
After various trials, the operator more compliant with this thesis is OPENING, as it
effectively removes small objects and irregularity.
45
Histogram equalization employs a transformation that flattens the histogram of image
in input. The flatter the histogram, the more enhanced the contrast in the image.
• F_VN_LaplacianFIlterExp: is also used for the edge detection. It uses a kernel that can
vary in size. The Exp suffix indicates an evolute version of LaplacianFilter. Unlike
other edge related filter, this one includes the border extrapolation with eBorderType
and a scaling factor fScale.
46
Figure 40: BilateralFilter
Figure 41:CopyIntoDisplayableImage
Figure 42:TransformIntoDisplayableImage
Below is the implemented code along with the respective functions. The main concept of this
implementation is the ‘blob’, which is a group of connected pixels that share some common
property.
47
Figure 43: ObjectDetection() first part, Blob detection, check contours and circularity, approximate to
polygon and draw contours
The parameters for blob detection are defined in the variables as TcVnParamsBlobDetection,
see in Figure 43 at line 64, 65, 66, 69:
• stBlobParams.bFilterByArea is used to filter blobs within a specific area range; in
this case stBlobParams.fMinArea := 100 and stBlobParams.fMaxArea :=
2000000.
• stBlobParams.fMinCircularity := 0.80 ensures that only circles are identified.
The function F_VN_DetectBlobs(), Figure 44, is used for the detection. It applies a threshold
and a contour search ( via F_VN_FindContours) along with options for filtering the found
contours. ipBlobContours is a container where all the contours are stored for further filtering.
48
Figure 44: DetectBlobs in the image
3.2.3 Rectangles
If the circularity is less then 0.80, the object is a rectangle.
49
To draw the contour in detailed manner, with F_VN_DrawContours, an approximation to a
polygon is applied with F_VN_ApproximatePolygon. bClosed is set to TRUE because the
contours are closed.
50
Once the contour is highlighted, it is possible to find the extreme points (Figure 50) with
F_VN_ContourExtremePoint: TOP_RIGHT, TOP_LEFT, BOTTOM_RIGHT,
BOTTOM_LEFT of the contours.
When the 4 points are retrieved, the next step is to draw lines from TOP_LEFT to
BOTTOM_RIGHT and from TOP_RIGHT to BOTTOM_LEFT.
To verify the precision of the retrieved points they are visualized as follow:
•
Figure 52: DrawPointExp
•
Figure 53: DrawLine
The contours, lines and points are ultimately visualized in green. Since the image captured
form the camera is in grayscale, a conversion in RGB color space is necessary to visualize the
colors. This is achieved with F_VN_ConvertColorSpace(). The transformation used in this
case is specified by eTransform as TCVN_CST_GRAY_TO_RGB.
51
Once the lines are drawn, it is possible to find their center by calculating the midpoint of each
lines; this point will serve as the center: centerX and centerY in the code in Figure 55/56.
52
The coordinates of the center should then be transformed from pixel format into World points,
with F_VN_TransformCoordinatesImageToWorld_Point, where the reference system is based
on the camera’s position.
In the end, when the image is displayed, the correct label with center position will appear,
based on the object recognition, with F_VN_PutText().
3.2.4 Circle
If the contour has circularity > 80, it is identified as a circle. After highlighting the contours,
the center of mass is calculated using F_VN_ContourCenterOfMass().
53
Figure 60: ObjectDetection, fifth part
After the transformation, display the image and add the correct label for the detected object.
54
To open the ADS, click ‘Visualize’ on the toolbar.
55
Figure 65: Rectangle
56
The object’s center is identified even when it is not completely displayed. Although this may
not be precise enough for the robot.
57
3.2.6 Template Matching
Template matching is a vision technique useful for finding parts of an image that perfectly
match a predefined template.
Generally, the algorithm operates by sliding a template image over a source image to
determine the position where the template best matches the source image.
In the trials done, see the code in Figure, is used this techinique for recognizing rectangles and
circles using a predefined template representing these shapes.
A predefined template, in this context, is an image of an object, either a rectangle or a circle,
that matches what we want to recognize in our project. The template image is added to File
Source Control and displayed on ADS control using GetCurrentImage().
The function F_VN_MatchTemplateAndEvaluate(), Figure 71, compare the source image
from the camera with the template image. If a match is found, it highlights the match location,
ipMatches, as the result.
In ipMatches, the coordinates of the object are stored, allowing for the retrieved object to be
drawn on the resulting image.
For the rectangle, due to their varying orientations, this algorithm may not always be the best
choice; sometimes it successfully recognizes the shape, but in other case, the results differ
from expectations.
For the circle, this algorithm performs better because their shape doesn’t change orientation.
Finally, this algorithm does not perform as well as the Object Detection methos using
DetectBlob() described in the previous section.
58
Figure 72: Template algorithm implementation
3.2.7 CannyEdgeDetection
The Canny Edge Detector is used for the edge detection, means the boundaries of an object
within an image. An edge is defined by the change in pixel intensity. It reduces noise while
preserving important features of the original image. The process is divided into several steps:
• Grayscale conversion: the algorithm should be applied to a single channel image to
preserve the brightness levels. Thus, the conversion is from RGB(Red, Green, Blue) to
Gray.
• Noise reduction: the best filter for noise reduction is the Gaussian Filter which
smooths the image using a Gaussian kernel. This kernel slides across the entire image,
taking weighted average of the neighboring pixels intensities.
59
• Gradient calculation: measures the intensity changes for each pixel in a specific
direction (x or y of the image). The gradient can be calculated by using the Sobel
operator.
• Non-maximum suppression
• Double threshold
• Edge tracking by hysteresis
The function used is F_VN_CannyEdgeDetectionExp. The threshold are set through
fThresholdLow and fThresholdHigh range. bL2Gradient is used for the gradient calculation;
there are two standards available for this purpose:
• L1 (bL2Gradient is FALSE): The gradient is the sum of absolute values of the
gradients. This method is faster.
• L2 (bL2Gradient is TRUE): The gradient is the square root of the sum of the squares
of gradients, providing greater accuracy.
Once the Canny algorithm, see Figure 73, has been applied to the image, the next step is to
find contours and then the centers of the figures using the strategy presented in Object
Detection.
In conclusion, the algorithm that performs best is the ObjectDetection() with DetectBlobs(),
discussed in Section. This is because manually applying different filters provides the
opportunity to improve step by step the images and the final results.
60
Figure 74: Algorithm with Canny Edge Detector
61
Chapter 4: Machine Learning
Instead of using the previous developed filters in TwinCAT, it is possible to create a neural
network and perform a comparison for the recognition of rectangles and circles centers.
For this task, Google Colab is used with a GPU(Graphics Processing Units) configuration,
designed to handle complex computations efficientiently. However, Google Colab provides a
default CPU runtime.
The python notebook, saved as a .ipynb file, can be accessed at the following link:
https://colab.research.google.com/drive/1D4dwXXG8OBxs_igEdWb5X3a98iKJEbaY .
In Google Colab, the code is divided into sections, which helps to create an organized
notebook where each part serves a specific task. The main limitation of this notebook is its
limited memory; due to the small amount of RAM, training a neural network will be slow,
even with small datasets. Consequently, the number of images used is not ideal but it is the
maximum that can be supported given these memory constraints.
This section highlights the process of creating a neural network capable of recognizing the
centers of circles and rectangles. The process involves the following steps:
• Creation and preparation of an appropriate dataset
• Division of the data into training, testing and validation set
• Definition of the architecture of a CNN, considering different types of layers
• Definition of the loss function, optimization method and accuracy metrics
• Training of the model
• Testing using images captured by the camera
62
4.1 Introduction
Firstly, the necessary libraries for this notebook are imported (Figure 76). If a library is
missing, the command ‘!pip install’ is the command for the installation.
The main library used is Tensorflow, which can be installed with the command !pip install
tensorflow.
Tensorflow is an open-source and free software library designed for high performance
computations such as those needed machine learning and artificial intelligence.
As illustrated in the Figure, execution starts from the play symbol located on the left part of
the each cell. If all the cells need to be run, there is a specific button ‘Run all’ shown in Figure
77.
To proceed, there’s a useful check to prevent slowing down the training in the subsequent
sections: by using the function in the Figure 78, you can ensure that the code will run using
the GPU rather than the CPU.
63
Figure 78: test if the cell is executed with GPU
64
Figure 80: Generate circles function 2
The outputs of this function are images and their corresponding labels, which are
saved in a .csv file containing the image path with its extension, label and center
coordinates.
The process begins with generating a random radius for each circle to ensure diversity
in their size.
The minimum radius is calculated as 5% of the smaller dimension of the image, while
the maximum radius is the 15% of that same dimension. The radius for each circle is
then selected using random.randint to choose the minimum or maximum values.
To ensure that the circles do not extend beyond the image boundaries, correct x and y
coordinates are calculated using formula image_width/image_height – radius. This
calculation helps place the circle within the image frame properly.
• Generate_rectangles(count, image_width, image_height) is the function used to
generate rectangle. The inputs for this function are the same as those for the circle
generation.
65
Figure 82: Generate rectangles function 2
In the case of rectangles, the generation process starts by defining their width and
height. These dimensions are set proportionally to the dimensions of the image, with
the width ranging from 50% and 70% of the image width, instead the height from 30%
and 50% of the image height. To draw the rectangle, the coordinates x0, y0, x1, y1
need to be calculated, they are the four points representing the top left corner(x0, y0)
and the bottom right corner(x1, y1).
Additionally, data augmentation (explained in detail in 4.2 section) includes rotating
the rectangle by a random angle between 30 and 360 degrees to enhance the model’s
robustness.
Figure 83: Generate images and labels for circles and rectangles
Initially, trials are conducted with a larger number of images such as 5000 for circles and
5000 for rectangles. However, training is interrupted without yielding results due to the
memory constraints, leading to the conclusion that more powerful hardware is needed to
handle such a large dataset.
The images and labels generated by generate_circles() and generate_rectangles() functions are
saved in circle_images, circle_labels and rectangles_images, rectangles_labels respectively
66
(Figure 83). The width and height reported in these functions match the output of the camera :
800x600 pixels.
After generating the images and labels, all data is combined using the operator concatenation.
A test is conduced to ensure that each image retains the correct label post-concatenation, such
as in Figure 84.
To further improve the accuracy of the model a shuffle of images indices is performed (Figure
85). This randomizes the order of circle and rectangles, which enhances training performance.
Additionally, in testing phase, it’s important to verify that the indices correspond correctly to
the images, as mismatches can occur.
Subsequently, the output is verified to be correct and appears as shown in the figures below.
The coordinates, in square brackets, are displayed in the pixel coordinate system.
67
Figure 86: Circle with larger random radius
68
Figure 88: Rectangle in the image's center
69
4.2 Data augmentation
Data augmentation is a technique used to artificially increase the size of a training dataset by
applying various transformations to the existing data[14]. There are several benefits:
• It adds new data to the existing dataset, increasing efficiency and performance.
• It increase variations in the data, which helps prevent overfitting.
• More data leads to better generalization.
The data augmentation process describes in this code, Figure 90/91, is contingent on a binary
flag, random.choice([True, False]), which randomly decides whether to apply a
transformation. For instance, rotations are applied to rectangles in 50% of the cases. This
approach allows for the generation of images featuring normal rectangles, rotated rectangles
and rectangles of varying sizes.
These steps are followed for rotating a rectangle:
• Calculation of the corners in the original rectangle.
• Rotation of the vertices based on a random angle.
• Definition of the bounding box containing the new rectangle.
• Verification that this bounding box fits within the image dimensions. If it does, the
new coordinates of the center are used; otherwise, the original rectangle is retained.
70
Rotation is not applied to the circle because the change would not be noticeable.
The images generated maintain a balance between circles and rectangles.
71
4.3 Models
Keras is a high level API in TensorFlow that is useful for machine learning tasks. Models are
typically created using tf.keras.models.Sequential, which takes various layers as input.
The choice of layers can significantly impact the model’s accuracy, loss and its ability to
correctly predict labels.
For this reason, different configurations of the model will be presented, each with unique
layer arrangements tested for effectiveness.
The usual procedure begins with loading the dataset, Figure 94, in this case from two CSV
files: circle_annotations.csv and rectangle_annotations.csv. These files contain annotations for
training the model to recognize and differentiate between circles and rectangles.
For training, it is essential to divide the images into training, test and validation set; in this
case the train_test_split function is used with the size of the test of 20% of the total images.
Following the dataset split, the next step involves creating a convolutional neural
network(CNN).
72
Typically, a CNN, is structured in layers such as in Figure 97, each one serving a specific
purpose:
73
4.1.1 First model
The initial model architecture, in Figures 99/100, consists of the following layers configured
using TensorFlow’s Keras API:
• Convolutional layer (tf.keras.layers.Conv2D()): takes an input image shape of
800x600 pixels. It applied a convolution operation using a kernel of size 3x3.
Initially a smaller number of the filters is used to focus on detecting simple features.
As more convolutional layers are added, the number of filters increases to enhance the
model’s ability to recognize more complex objects.
• Max Pooling Layer (tf.keras.layers.MaxPool2D()): follows the convolutional layer
uses a kernel of size 2x2 to perform max pooling as shown in Figure.
• Flatten layer (tf.keras.layers.Flatten()): flattens the input multidimensional output of
the previous layer into one-dimensional(1D vector). array to prepare it for input into
the dense layers.
• Dense Layer (tf.keras.layers.Dense()): is a fully connected layer where every neuron is
connected to all neurons in the previous layer. The layer performs a weighted sum of
the inputs, and these results are passed through an activation function.
In this model, a dense layer is used both in the penultimate layer and also as output
layer, where it defines the two coordinates: x and y.
• Dropout layer (tf.keras.layers.Dropout()): randomly sets inputs units to zero during
training time at a specified rate, which helps to prevent overfitting. The rate varies
from 0 to 1.
74
Figura 100: First model 2
75
Figure 100: Second model 1
With the addition of Dropout layers, the model appears to perform slightly better because they
enhance the network's power, leading to better generalization. However, its performance has
not yet met expectations.
76
• Adding two tf.keras.layers.Conv2D() layers, each with 256 filters, to the final stage of
the network, replacing the previous layer with 128 filters, enhance the model’s
capacity to extract complex features from input.
• Adding tf.keras.layers.BatchNormalization() after each convolutional layer improves
the training speed and stability. Batch normalization processes the inputs normalizing
them, means adjusting the inputs to have a mean of 0 and a variance of 1. It helps
ensure that each layer receives data on similar scale.
4.4 Training
At this point, we are going to conclude the preparation process of the CNN by setting
additional parameters useful for future training with the method model.compile (Figure 105):
• Optimizer: helps to minimize the loss function. In this specific situation, Adam
optimizer is considered, which effectively minimizes the loss function during the
training of neural networks. The learning rate is an important hyperparameter, lower
learning rate provide stable convergence but result in slower training, instead higher
such as in this code speed up training.
• Loss: helps to understand whether the model is performing better or worse over time.
Generally, for scenarios involving two or more label classes, cross-entropy is used.
The loss used is ‘mean_squared_error’, common in regression tasks like this. It
measure the average of the squares of the errors, which is the average squared
difference between the estimated values and the actual value.
• Accuracy: is a metric to measure the percentage of correctly predicted instances
compared to the instances in the dataset.
The method model.fit() trains the neural network using the training data generated in the
Figure 96 (80% for X_train, 20% for X_test). The training occurs over a specific number of
epochs, in this case 20.
An epoch is the passage of the entire dataset through the network. Generally a greater number
of epochs improves the results at the end of the training.
78
The validation set is a collection of data completely new to the model; the images contained
should have the same dimensions and color model as those used in the training set. The idea is
to compare the predicted labels with the true labels. Then, with the validation set, it is
possible to validate the results obtained during the training.
There is a procedure designed to improve performance during training called reduce_lr, which
reduces the learning rate when a metric has stopped improving.
To determine if the network performs well, two images saved from the camera used in the
laboratory will be uploaded such as in Figure 106. These images are uploaded using
files.upload(), and are resized to match the dimensions of the images passed to the network
for the training. This ensure consistency in input size, which is crucial for the model to
process the images effectively.
By utilizing the plt library from matplotlib, it is possible to display the test images with their
predicted coordinates x and y as shown in green in the next figures.
79
Figure 107: Display the center predicted
Finally, the trained model is saved (Figure 113) and converted to be ONNX format (Figure
114) for loading into TwinCAT.
81
Figure 112: Save the model
82
Chapter 5: Experiments with the robot
The experiments were conducted by integrating the camera vision system with a robotic
project, introducing two new modules in the TwinCAT environment:
• Drive manager module: responsible for motor configuration. After scanning, all
detected motors are automatically added to I/O module. In this project, there are five
motors: four for the robot and one external motor.
• Motion module: provides solutions for various tasks such as kinematic
transformations, planar motion and pick-place operations. Following the drive
manager’s configuration, the axes are automatically defined in the environment. The
cartesian axes, however, are manually calculated. This section highlights the
coordinates of the conveyor belt with the respect of the robot system.
The MAIN function acts as a state machine to ensure an ordered code execution and to follow
the flow of the robot’s operations step by step.
The states are the following:
• State 0: is a transition state to State 10, which can be considered as a check state.
• State 10: involves the activation of the axes. The MAIN sends a request for the
activation of Axis 1. This process is repeated for all the considered axes.
• State 20: the robot waits for a command. After each state, the machine returns to this
state and waits for the next operation.
• State 30: the robot should reach its home position.
• State 40: the robot moves to an intermediate position where it awaits the coordinates
of the object.
• State 50: the conveyor is activated and managed as another machine state:
o State 20: is an initialization of the conveyor.
o State 30: the velocity setting is defined here; it starts the movement.
o State 40: is a waiting state. The conveyor waits until an object passes through
the photocell. If no object is detected, the system will remain in this state.
o State 50: when an object is detected, synchronization between the robot and
conveyor occurs in order to pick the object.
o State 60/70: the robot picks up the object.
o State 80/90: the robot places the object in a specific position.
o State 100: there is a return to waiting state.
• State 80: is an emergency state, in case of problems, all the axes will be stopped.
83
This state machine configuration ensures that the robot operates smoothly and effectively,
managing tasks sequentially while maintaining synchronization between the conveyor and
robotic arms.
84
5.1.3 Camera system
The camera reference system axes and the photocell reference system axes are aligned, but the
camera reference system center is translated from the photocell reference system by 45 mm
both in the x and y direction.
The exposure time (or shutter speed) is the time span for which the film of a traditional
camera or a sensor of a modern digital camera is actually exposed to the light so as to record a
picture.[17]
85
If the shutter speed is slower(1/500), less light enters the camera, resulting in a clearer image.
Conversely, if the shutter speed is faster(1/2) the image may appear blurred (Figure 119).
In this case, the relationship between the conveyor and the shutter speed is: a slow conveyor
corresponds to a slow shutter speed, and a fast conveyor corresponds to a fast speed.
Given these considerations, adjustments to the exposure time of the camera can be made in
the Camera Assistant with the following settings:
• ExposureAuto: should be turned Off, t prevent the camera from automatically
adjusting the exposure time during each run.
• ExposureMode: should be set to Timed, by seleceting this mode, you can manually set
the exposure time to a specific duration.
• ExposureTimeAbs: is the absolute exposure time.
In this project, with the conveyor velocity set at 40 m/s the exposure time, Figure 120, is of
50354 microseconds(higher exposure). The makoG192B camera can handle higher blur
distances, thus a higher exposure time.
86
Figure 119: Configuration Assistant, ExposureTime
87
Figure 121: Track the object from InitialObjectPos1
In conclusion, using a gripper, the object will be picked up from the detected position and
placed in the specified location, in this case the table near the conveyor. This is achieved
using a ‘open’/’close’ mechanism. If ‘open’, the gripper picks up the object; then it closes to
secure the object, and upon reaching the specifies location, it opens again to release the
object.
88
89
Conclusion
This thesis has effectively demonstrated the fundamental role of computer vision in
modernizing industrial processes through the use of TwinCAT's integrated environment. The
real-time communication enabled by TwinCAT has significantly enhanced operational
efficiencies across various manufacturing tasks. The seamless interaction between hardware
and software facilitated by this cohesive framework has proven essential for optimizing
automation processes, reducing error rates, and increasing overall productivity.
TwinCAT Vision, in particular, has been pivotal in realizing the potential of real-time capable
image processing within the TwinCAT 3 runtime environment. The synchronous execution of
image processing algorithms with the control system, alongside the ability to run these
algorithms in parallel on multiple cores, has minimized processing delays and maximized
computational efficiency.
A significant aspect of this thesis focused on the application of object detection in an
industrial setting. The precision and reliability with which TwinCAT Vision handled real-time
object recognition on conveyor belts highlighted the transformative potential of integrating
advanced vision technologies with robotic systems. This integration not only improved the
throughput of the production line but also enhanced the quality control mechanisms, ensuring
that only items meeting the required standards progressed further in the production process.
In conclusion, the adoption of TwinCAT and TwinCAT Vision within industrial systems
represents a robust approach to leveraging advanced computer vision technologies. This
integration promises to streamline production lines and elevate the standards of industrial
automation, making it a crucial step towards the future of manufacturing efficiency.
90
91
Images
94
95
Sitography
[1] https://www.sas.com/en_us/insights/analytics/computer-
vision.html#:~:text=Computer%20vision%20is%20a%20field,to%20what%20they%20%E2
%80%9Csee.%E2%80%9D
[2] https://www.automation.com/en-us/articles/march-2023/understanding-role-machine-
vision-industry-4
[3] https://www.britannica.com/technology/artificial-intelligence
[4]https://en.wikipedia.org/wiki/Pinhole_camera_model#:~:text=The%20pinhole%20camera
%20model%20describes,are%20used%20to%20focus%20light.
[5] https://www.analyticsvidhya.com/blog/2021/05/convolutional-neural-networks-cnn/
[6] https://www.ibm.com/topics/computer-
vision#:~:text=Computer%20vision%20is%20a%20field,they%20see%20defects%20or%20is
sues.
[7] https://www.unitronicsplc.com/what-is-plc-programmable-logic-controller/
[8] https://www.paessler.com/it-explained/plc
[9] https://itsmaker.it/wp-content/uploads/2018/04/1-TwinCAT-3-Overview-IT-1.pdf
[10] https://www.beckhoff.com/it-it/
[11] https://www.iqsdirectory.com/articles/automation-equipment/industrial-
robots.html#:~:text=Common%20applications%20of%20industrial%20robots%20are%20pro
duct%20assembly%2C%20machine%20loading,painting%2C%20coating%2C%20and%20in
spection.
[12] https://www.educative.io/answers/what-is-canny-edge-detection
[13] https://www.futurelearn.com/info/courses/introduction-to-image-analysis-for-plant-
phenotyping/0/steps/297750
[14] https://medium.com/@abhishekjainindore24/data-augmentation-00c72f5f4c54
[15] https://www.nature.com/articles/s41598-024-51258-
6#:~:text=Max%20pooling%20is%20a%20commonly,each%20small%20window%20or%20r
egion.
[16] https://computersciencewiki.org/index.php/File:MaxpoolSample2.png
[17] https://www.smartray.com/glossary/exposure-
time/#:~:text=The%20exposure%20time%2C%20respectively%20period,time%20is%20give
n%20in%20seconds.
[18] https://shotkit.com/field-of-view/
[19] https://www.alliedvision.com/en/camera-selector/detail/mako/g-192/
96
[20] https://www.geeksforgeeks.org/calibratecamera-opencv-in-python/
[21] https://www.javatpoint.com/regression-vs-classification-in-machine-learning
[22] https://www.beckhoff.com/it-it/products/automation/twincat/tfxxxx-twincat-3-
functions/tf7xxx-vision/tf7100.html?
97