UNIT-3
KNIME
KNIME
KNIME stands for Konstanz Information MinEr Pronounced as “naim”
Developed by KNIME AG located in Zurich and the group of Michael Berthold
at the University of Konstaz, Chair for Bioinformatics and Information Mining
WHAT IS KNIME
A tool for data analysis, manipulation, visualization and reporting
Based on Graphical User Interface
Popular for its flexibility and ability to integrate with various data sources
and tools, including
Databases
R
Python
Key Features and
Benefits of Knime
OPEN SOURCE MODULAR RICH NODE INTEGRATION SCALABILITY AUTOMATION
WORKFLOW DEPOSITORY
KNIME Server vs KNIME
Analytics Platform
KNIME Installation
and Set up
Visit the KNIME Website: https://www.knime.com
Choose the Edition
KNIME Analytics Platform – Free Desktop Version for individual users
and smaller teams
KNIME Server – Commercial offering for larger organisations
Download KNIME Analytics Platform – Latest Version
Select your operating System - Windows, macOS, or Linux
Complete Download
Install KNIME
Launch KNIME
Download KNIME
Install KNIME
Windows
Run the downloaded installer or self extracting archive.
If zip archive is downloaded, unpack it to a desired location.
Run knime.exe to start KNIME Analytics Platform
Linux
Extract the downloaded tarball to a location of desired choice.
Run the knime executable to start KNIME Analytics Platform
Mac
Double click the downloaded dmg file and wait for the verification to finish
Then move the KNIME icon to Applications.
Double click the KNIME icon in the list of applications to launch KNIME Analytics Platform
KNIME Workbench
Components
Welcome page
KNIME Explorer
Workflow Editor & Nodes
Workflow Coach
Node Repository
KNIME Hub Search
Description
Outline
console
KNIME Workbench
Workflow Editor (Workspace) – Central
Space where workflow is designed
Node Repository – Panel where Nodes
are available
Console – Debugging Tool, gives
feedback on the workflow status and any
error messages
Outline – Overview of workflow Structure
Node Description- Gives summary of
the selected node in “Workflow Editor” or
“Node Repository”
Explorer – Panel shows the list of
workflow available in the selected
workspace (LOCAL) or EXAMPLES
server or on other connected KNIME
Basic go through KNIME
Workbench
KNIME Components and
Terminology
Node
Building Blocks of KNIME Workflow.
Represents a Specific operation or analysis
Step
Port
Nodes have input and output port
Input port receives data and output port sends
data to other nodes
Data flows between nodes through these
ports
Workflow
Sequence of node connected to each other
Represents entire data analysis program from
data input to output
KNIME Components and
Terminology
Data Table
Data in KNIME is represented as a
tabular data structure
Each Row is a data point
Each Column is a feature or Attribute
KNIME Components and
Terminology
Connectors: Different Connectors
Lines that link the output port
of one node to the input port
of another, defining the flow
of data within a workflow.
Meta Node
A container that allows to
group nodes and create
reusable sub-workflows Meta Node
simplifies the visualization of
complex workflows
KNIME Components and
Terminology
Variable
used to store and manage data or
values within a workflow
can be created, modified, and used in
various nodes
Workflow Variable:
Variables specific to a workflow and can
be used to pass data or values between
nodes within the same workflow
Dataflow in KNIME
Dataflow defines how data is
processed and transformed as
it moves through the workflow.
Keypoints about data flow in
KNIME include:
Input and Output Ports
Connectors
Data Table
Data Transformation
Workspace
The folder where all current workflows
and preferences are saved for the next
KNIME Session
The folder where all current workflows
and preferences are saved for the next
KNIME Session
By default, the workspace folder is
“…\knime-workspace”
Can be changed, by the changing the
path in the “Workspace Launcher”
window, before starting the KNIME
working session
Exercise-1 Create
Workspace
Launch KNIME
In the Workspace Launcher
Window, Click “Browse”
Select the path for the new
workspace
Create “Test Workspace”
KNIME Workflow
KNIME Analytics platform does not work with
Scripts, but with graphical workflows
Each step of the data analysis is implemented
and
executed through a little box called “Node”
Sequence of Nodes makes a workflow
An analysis flow in graphics, having the
following process:
Step 1: Read data
Step 2: Clean Data
Step 3: Filter Data
Step 4: Train a model
Workflows in KNIME are graphs connecting
nodes, or formally, direct acyclic graphs (DAG)
File Extensions: .knwf
and .knar files
Knime workflows can be package and
exported in “.knwf” or “.knar” files
A “.knwf” file contains only one workflow
A “.knar” file contains a group of
workflows
A double click opens the workflow inside
KNIME Analytics Platform
Workflow Configuration And
Execution
1. Node Configuration
2. Variable Assignment
3. Execution Control
4. Monitoring Execution
5. Workflow Results
Building a Basic Workflow
Launching KNIME
Creating New Workflow
Go to the “File” menu
Select “New Workflow”
– Creates a new canvas
to design workflow
Building a Basic Workflow
Adding Nodes – Workflows in KNIME are
built by adding nodes, by dragging and
dropping onto the Canvas from the Node
Repository
Connecting Nodes – Nodes are connected
using connectors. Output of one node is
connected to the input of the next node.
Configuring Node – Double click or Right
Click on a node to open its configuring node.
Running the Node – To execute the
workflow, Click the “Run” button on the
toolbar
Visual KNIME
Workflows
What is KNIME
Extensions
KNIME Extensions Fast, flexible way to
extend your data science platform. Open
source extensions provide additional
functionalities such as access to and
processing of complex data types, as well as
the addition of advanced machine learning
algorithms.
Install Extensions
From the top menu, select “File
Install KNIME Extensions
->Select
-KNIME Math Expression
extension (JEP)
-KNIME External Tool Node
-KNIME Report Designer
->Click “Next”
Exercise - 2
01 02
Installation of Lab extensions
additional
extensions
Exercise 3
Install the following Extensions
KNIME Database
KNIME JavaScript Views
KNIME Report Designer
Solution: Exercise 3
Select From the Top right corner options, select Menu “Install Extensions”
Search Search for Required Extensions
Click Click Next
Follow Follow the instructions
Data Access
Files
CSV, txt, Excel, Word, PDF
XML, JSON, PMML
Images, Texts, Networks
Databases
MySQL, PostgreSQL, Oracle
Theobald
Any JDBC (DB2, MS SQL Server)
Other
Twitter
Google
Sharepoint
Transformation
Preprocessing
Row, column
Data Blending
Join, concatenate, append
Aggregation
Grouping, Pivoting, Binning
Feature Creation and Selection
Create a Node
Drag and Drop the node from the “Node
Repository” panel into the workflow editor,
or
Double-click the node in the “Node
Repository” panel To Connect a node with
existing nodes
Select a node in the workflow and double
click a node in the repository
Click the output port of the first node and
release the mouse at the input port of the
second node
View a Processed Node
If the execution is successful, Green light will be shown, and can view the
processed data
01 02 03 02
The data table
Select the last
Right-click the with the
option in the
node processed data
context menu will be
appeared
Create a workflow
Group
If click on the Home Button:
Click on the “Local Workspace”
Select “Create Folder”
In the “Create Folder” dialog:
Enter the name of Workflow Group
Create a Workflow
In the Local Space:
Click on the “+” symbol
In the “Create a new
workflow” dialog:
Enter the name of
Workflow
Click Create
In Space Explorer
Click on the black button
with three dots
Click “Create Workflow”
Save a Workflow
Saving the workflow saves the
workflow architecture, the node’s
configuration, and the data
produced at the output of each
node.
Click the disk icon on the Top
Menu
To save the copy of the currently
selected workflow, Click “Save
as..”
To save ALL open workflows,
Click “Save ALL” stack of disks
icon.
Delete a Workflow
-Right Click the workflow in the “KNIME
Explorer”
-Select Delete
-Confirm Delete
Import/Export Workflow
Steps :
To import workflow, Right Click
anywhere local workspace in KNIME
Explorer
To export a workflow or workflow group,
first select the workflow (or group) to
export
Next, write the path to the destination
folder and the file name. While exporting
workflow group, select the elements you
want to export from inside the folder.
Exercise 4
Create Empty Workflow
Click “New” in the toolbar panel
Right Clicking a folder in the local workspace in the KNIME Explorer
Enter the name of the “WorkFlow”
Browse the destination folder
Click “Finish”
Data Importing &
Blending Node Operations
Read data from Understanding different data
the file structure & data type
Data
Exploration
Read Data
❑Steps :
❑Add a file reader node, by double clicking or by drag &
drop
❑In Configuration Dialog,
❑Click “Browse "to select the path of the file
❑In most of the cases, File Reader, automatically detects
the file structure
❑Ifnot, then enable/disable all required checkboxes
according to the data structure
Excel Reader
Reads .xls and .xlsx file from Microsoft
Excel
Supports reading from multiple sheets
Exercise - 5
STEPS -:
Write final data to file in CSV format
Create Workflow “Exercise 1”
Read file “data1.txt”
Change column name of “ranking” as
“marks”
Change column name of “ranking” as
“marks”
Remove Column “Class”
Write final data to file in CSV format
Database Connector Node
• Creates a connection to an artibitary JDBC
database
• Select an appropriate driver and provide the
JDBC URL of the database
Data Preparation and Cleaning
Essential steps to ensure the data
is accurate and ready for analysis.
Some nodes for this task:
Data Exploration
Initial overview of the data
Generates Summary
statistics and visualizations
Data Explorer
Install “KNIME JavaScript Views (Lab)”
Supports CSS styling
Data Cleaning
Nodes like “Missing Value” and “
Duplicate Row Filter” helps to
handle missing data and remove
duplicates
Missing Value – Helps to Handle
Missing Value in the data
Data Cleaning
Duplicate Row Filter – Identifies duplicate
rows can either remove all duplicate rows
from the input table and keep only unique
and chosen rows or mark the rows with
additional information about their
duplication status
Data Cleaning
Column Splitter
Splits the columns of the
input table into two output
table
Data Transformation
Data can be transformed using Nodes like “Column Filter”,
“Math Formula” , “String Manipulation” etc
Data Transformation
Column Filter Node:-
Allows columns to be filtered from the
input table while only the remaining
columns are passed to the output table
Data Transformation
Math Formula
Valuates a mathematical expression
based on the values in a row
Computed results can be either
appended as new column or be used
to replace an input column
Data Transformation
String Manipulation Node
Manipulates strings
like search and
replace, capitalize or
remove leading and
trailing white spaces.
Data Aggregation
Nodes like "GroupBy" to perform aggregation
operations on the , such as calculating sums,
averages, or counts.
Groups the rows of a table by the unique
values in the selected group columns.
Data Imputation
Rule Engine:
Takes a list of user-defined rules and tries
to match them to each row in the input
table. If a rule matches, its outcome value
is added into a new column.
Exercise 6
Extract details of persons born outside “United-States” in CSV Form
Extract details of persons born outside “United-States” in CSV Form
Read the file adult data
Rename column “fnlwgt to Final Weight”
Remove Column “Final Weight”
Remove Row Containing “ United-States”
Write the data to “CSV file” named “Born outside US”
DATA INTEGRATION AND
TRANSFORMATION
Involves combining data from various sources, reshaping it
and preparing it for analysis. Some nodes used for the
purpose are:
Joining Data : Combine data from multiple tables based
on common keys or criteria
Pivoting and Unpivoting : Help to reshape data from wide
to long format or vice versa
Data Sampling : Used to select a subset of data for
analysis
Data Normalization and Scaling : Normalize and Scale the
data to prepare it for machine learning algorithms.
Text Mining and NLP: Supports text data processing and
Natural Language Processing
EXPORTING DATA FROM
KNIME
Data Export Node: To Export Data Visualisation: Nodes like Model Deployment: Machine Data Reports: Reports with
Data, nodes like “CSV Writer”, “Bar Chart”, “Pie learning models in KNIME can customised layouts and
“Excel Writer” or “Database Chart”,”Heatmap” etc are be exported for deployment export them in various
Writer” is used depending on used to create charts, plots. in production environment. formats such as PDF or HTML.
the nature of output
Exercise 7
Objective: To do data Visualization
Create a workflow “Exercise 7” under the workflow group “Exercises”
Read Data Iris
Name the columns : Sepal length, Sepal Width, Petal Length, Petal Width,
Class
Classify the flower types to” Iris-setosa”, “Iris-versicolor” and “Iris-Virginca” to
Class 1, Class 2, Class 3 respectively
Split the contents of “Class” into three columns
Joining two columns
Converting the contents of a column to upper case
Replacing the word “Iris” to “Flower”
Creating “Bins” based on Sepal Length
Grouping the “Class” based on Bins
Create Bar Chart and Scatter Plot
Connecting to Big Data
Sources
KNIME provides various connectors and integrations to connect to Big
data platforms:
Big Data Nodes Used Nodes
Platform
Hadoop HDFS Connector
Distributed File HDFS File Picker
System (HDFS)
Apache Spark Spark Reader- to load data
from Spark data frames
Spark SQL: Querying and
Manipulating Data
Connecting to Big Data
Sources
Big Data Platform Nodes Used Nodes
Apache Hive Hive Connector
Hive Table Selector
Big Databases like
HBase, Cassandra
Other Sources: Amazon S3
Amazon, S3, Google connector
Cloud Storage, Google Cloud
Azure Blob Storage Storage Connector
Handling Big Data in
KNIME
Data Sampling
KNIME's sampling capabilities to work with a representative subset of the data for initial
exploration and modeling
Distributed Computing
Processes databases in parallel, which improves processing by utilizing multiple
processing nodes.
Data Chunking
To prevent memory constraints, KNIME can process data in smaller, manageable
chunks.
Data Compression
Data Compression techniques are employed to reduce storage requirements and
optimize data transfer between nodes an across the network.
In – Database Processing
Data remines within the database for analysis , thus minimizes data movement and
enhances performance.
Big Data’s Data Sampling
workflow
Data Chunking Node with
workflow
Data Compression Node with
workflow
01 02 03 02
In-Database Processing
DISTRIBUTED DATA PROCESSING
Distributed data processing is a key capability in
KNIME for efficiently analysing and processing
Big Data:
Parallel Execution Data Partitioning Load Balancing Scalability
Automatically partitions data into smaller
Distribute Data processing tasks across multiple scale horizontally by adding more compute
chunks, which can be processed in parallel by Distributing tasks evenly across available
nodes or worker machines, thereby improves resources to handle larger datasets and more
different nodes or worker nodes, thus ensuring resources to maximise their utilisation
performance and reduces processing time. complex analyses
efficient resource utilisation.
BIG DATA ANALYTICS WITH KNIME
Data Exploration
Tools to gain insights into big data, generate summary statistics and create
visualisations
Data Preprocessing
Clean, transform and prepare Big Data for analysis
Machine Learning
Offers a wide range of machine learning algorithms which can be applied to big data,
including classification, regression, clustering and anomaly detection.
Models can be build for predictive analysis
Text and Image Analysis
Allows to extract valuable information from Unstructured Big Data Sources
Advanced Analytics
Conduct advance analytics like time series analysis, network analysis and geospatial
analysis on big data
BIG DATA ANALYTICS WITH
KNIME
Data Visualization and Reporting
Create visualizations and reports to communicate the findings effectively
Deployment and Integration
Deploy big data analytics workflow for production use
Export models and predictions for integration with other applications and data pipelines
Monitoring and Maintenance
Continuously monitor and maintain big data analytics workflows
Optimize workflows for performance
Keep KNIME and it extensions upto date
Collaboration and Sharing
Collaborate with team members and stakeholders by sharing knime workflows ,
reports and results
Use Knime Server for collaborative work and scheduled
execution
DATA TRANSFORMATION AND
MANIPULATION
Data Cleaning
Nodes like "Missing Value," "String Manipulation," and "Rule Engine" to handle missing
data, correct errors, and clean the dataset.
Data Transformation
For transforming data, use nodes such as "Column Filter,""Math Formula," and
"Pivoting," to reshape data and create new features.
Data Aggregation
Aggregate data using the "GroupBy" node to calculate summary statistics or create
aggregated datasets
Data Joining
Combine datasets using nodes like "Joiner" or "Concatenate" to merge data from
different sources or tables.
DATA TRANSFORMATION AND
MANIPULATION
Data Splitting
Split data into training and testing sets using nodes like "Partitioning" to facilitate model
evaluation.
Data Normalization and Scaling
Normalize or scale features to bring them to a common scale using nodes like
"Normalizer" or "Scorer
Text and String Processing
Nodes are available for Text Processing such as tokenization, stemming, and
sentiment analysis, making it suitable for text data manipulation
Data Transformation and
Manipulation related
workflow
STATISTICAL ANALYSIS IN
KNIME
Descritpive Analysis
Nodes like "Statistics" and "Data Explorer“ are used to compute
descriptive statistics, including measures of central tendency,
dispersion, and frequency distributions
Hypothesis Testing
Nodes like "Group Comparison" for comparing means, "Chi-Square
Test" for categorical data, and others.
Correlation Analysis
Determine relationships between variables using correlation
analysis nodes.
ANOVA and Regression Analysis
Perform analysis of variance (ANOVA) and regression analysis to
explore relationships between dependent and independent
variables.
Time Series Analysis
Analyse time series data with specialized nodes for
forecasting, trend analysis, and seasonal decomposition
Statistical Analysis Related
Workflow
MACHING LEARNING WORKFLOWS
Model Selection
Use nodes for model selection, such as "Variable Selection" and "Feature Selection," to identify
the most relevant features for modeling.
Model Training
Train machine learning models using nodes for various algorithms, including decision trees,
random forests, support vector machines, and neural networks
Model Evaluation
Evaluate models with nodes like "Scorer" to assess their performance using metrics like accuracy,
precision, recall, F1-score, and ROC curves
Cross Validation
Implement cross-validation techniques using nodes like "Cross-Validation Loop" to assess model
generalization
MACHING LEARNING WORKFLOWS
Ensemble Learning
Build ensemble models using nodes like "Ensemble Learner" to combine multiple
models for improved predictive performance
Hyperparameter Tuning
Optimize model hyperparameters with nodes like "Parameter Optimization" to
achieve the best model performance.
Machine Learning
Related Node and
Workflow
PREDICTIVE ANALYSIS AND
DATA MINING
Classification
Build classification models to predict categorical outcomes
KNIME supports various algorithms like logistic regression, decision trees, and k-nearest
neighbours
Regression
Perform regression analysis to predict numeric outcomes, using regression algorithms
like linear regression and support vector regression
Clustering
Use clustering algorithms to group similar data points together
KNIME offers k-means clustering, hierarchical clustering, and DBSCAN, among others
Anomaly Detection
Detect anomalies or outliers in data using specialized nodes like “Local outlier Factor”
PREDICTIVE ANALYSIS AND
DATA MINING
Association Rule Mining
Discover Patterns and associations in data using association rule mining
nodes
Time Series Forecasting
Forecast future values of time series data using dedicated nodes for time
series analysis and prediction.
Text Mining and NLP
Perform analyse and extract insights from unstructured text data
Geospatial Analysis
Perform geospatial analysis and visualization using nodes for geographic
information system (GIS) data
Classification Models to Predict
Categorical Outcomes Workflow
K – means Clustering Workflow
Geographic Information
System (GIS) data
workflow
DATA VISUALIZATION IN
KNIME
Data visualization is a powerful way to communicate insights and patterns in data
Visualization Nodes
Nodes for creating visualizations, including scatter plots, bar charts, line charts,
heatmaps, and more
Interactive Plots
Supports interactive plots that allows to explore data dynamically, which
includes zooming, panning, tooltips and filtering
Customization
Customize the appearance of the visualizations, such as color schemes, labels,
and axis scaling, to ensure clarity and relevance
Automated Visualization
Automate the generation of visualizations using
data-driven approaches.
Interactive Plots Output
Automated Visualization
Workflow
CREATING INTERACTIVE
DASHBOARDS
Dashboard Drag and Drop Real Time Updates Parameterization
Components Design
KNIME offers a range Build dashboards Dashboards can be Create dashboards
of dashboard using a user-friendly, designed to update with parameterized
components like drag-and-drop in real-time as data inputs, allowing
tables, charts, filters, interface changes, enabling users to customize
and input widgets Arrange components users to see the views based on their
that can be and link them to latest information preferences or
combined to create control each other instantly specific analysis
interactive dynamically requirements
dashboards
Interactive Output Example
MODEL DEPLOYMENT AND
INTEGRATION
Deploying machine learning models and integrating
them into production systems is crucial for making
data-driven decisions
Export Models RESTful Web Services Batch Processing Database Integration
KNIME allows to export trained machine learning
Automate model deployment by integrating KNIME Integrate models with databases to perform in-
models in standard formats (e.g., PMML) for Deploy models as RESTful web services using KNIME Enables real-time predictions and integration with
workflows into batch processing pipelines to generate database scoring, making predictions directly within
deployment in various environments, such as web Server. other applications
predictions on new data regularly the database engine
applications or databases.
Model Deployment and Integration
Workflow
AUTOMATION AND
REPORTING IN KNIME
Automation and reporting are essential for streamlining workflows and
sharing insights
Workflow Automation
Automate repetitive tasks and data processing steps using KNIME’s
workflow automation capabilities
Schedule workflows to run at specific times or events
Report Generation
Create customizable reports in KNIME with text, tables, charts, and
visualizations.
Can be generated automatically as part of a workflow or on-demand.
AUTOMATION AND
REPORTING IN KNIME
Data Export
Export data, results, and reports in various formats, including PDF,
Excel, CSV, and more, to share insights with stakeholders
Integration with External Systems
KNIME can integrate with external systems and databases to import
data, trigger workflows, and export results seamlessly
Notifications
Configure notifications and alerts to inform users or administrators
about workflow status, errors, or specific events.
Workflow Automation
Configure Notifications and
Alerts Template
TH A N K
YOU