0% found this document useful (0 votes)
16 views97 pages

Knime

KNIME is an open-source data analysis tool developed by KNIME AG, known for its graphical user interface and flexibility in integrating with various data sources. It allows users to create workflows using nodes for data manipulation, visualization, and reporting, with features such as scalability and automation. The document provides detailed instructions on installation, workspace setup, workflow creation, data import/export, and various data transformation techniques within the KNIME platform.

Uploaded by

sindhunivas937
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views97 pages

Knime

KNIME is an open-source data analysis tool developed by KNIME AG, known for its graphical user interface and flexibility in integrating with various data sources. It allows users to create workflows using nodes for data manipulation, visualization, and reporting, with features such as scalability and automation. The document provides detailed instructions on installation, workspace setup, workflow creation, data import/export, and various data transformation techniques within the KNIME platform.

Uploaded by

sindhunivas937
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

UNIT-3

KNIME
KNIME
 KNIME stands for Konstanz Information MinEr Pronounced as “naim”
Developed by KNIME AG located in Zurich and the group of Michael Berthold
at the University of Konstaz, Chair for Bioinformatics and Information Mining
WHAT IS KNIME
 A tool for data analysis, manipulation, visualization and reporting
 Based on Graphical User Interface
 Popular for its flexibility and ability to integrate with various data sources
and tools, including
 Databases
 R
 Python
Key Features and
Benefits of Knime

OPEN SOURCE MODULAR RICH NODE INTEGRATION SCALABILITY AUTOMATION


WORKFLOW DEPOSITORY
KNIME Server vs KNIME
Analytics Platform
KNIME Installation
and Set up
 Visit the KNIME Website: https://www.knime.com
 Choose the Edition
 KNIME Analytics Platform – Free Desktop Version for individual users
and smaller teams
 KNIME Server – Commercial offering for larger organisations
 Download KNIME Analytics Platform – Latest Version
 Select your operating System - Windows, macOS, or Linux
 Complete Download
 Install KNIME
 Launch KNIME
Download KNIME
Install KNIME
 Windows
 Run the downloaded installer or self extracting archive.
 If zip archive is downloaded, unpack it to a desired location.
 Run knime.exe to start KNIME Analytics Platform

 Linux
 Extract the downloaded tarball to a location of desired choice.
 Run the knime executable to start KNIME Analytics Platform

 Mac
 Double click the downloaded dmg file and wait for the verification to finish
 Then move the KNIME icon to Applications.
 Double click the KNIME icon in the list of applications to launch KNIME Analytics Platform
KNIME Workbench
Components

 Welcome page
 KNIME Explorer
 Workflow Editor & Nodes
 Workflow Coach
 Node Repository
 KNIME Hub Search
 Description
 Outline
 console
KNIME Workbench
 Workflow Editor (Workspace) – Central
Space where workflow is designed
 Node Repository – Panel where Nodes
are available
 Console – Debugging Tool, gives
feedback on the workflow status and any
error messages
 Outline – Overview of workflow Structure
 Node Description- Gives summary of
the selected node in “Workflow Editor” or
“Node Repository”
 Explorer – Panel shows the list of
workflow available in the selected
workspace (LOCAL) or EXAMPLES
server or on other connected KNIME
Basic go through KNIME
Workbench
KNIME Components and
Terminology
 Node
 Building Blocks of KNIME Workflow.
 Represents a Specific operation or analysis
Step
 Port
 Nodes have input and output port
 Input port receives data and output port sends
data to other nodes
 Data flows between nodes through these
ports
 Workflow
 Sequence of node connected to each other
 Represents entire data analysis program from
data input to output
KNIME Components and
Terminology
 Data Table
 Data in KNIME is represented as a
tabular data structure
 Each Row is a data point
 Each Column is a feature or Attribute
KNIME Components and
Terminology
 Connectors: Different Connectors
 Lines that link the output port
of one node to the input port
of another, defining the flow
of data within a workflow.
 Meta Node
 A container that allows to
group nodes and create
reusable sub-workflows Meta Node
 simplifies the visualization of
complex workflows
KNIME Components and
Terminology
 Variable
 used to store and manage data or
values within a workflow
 can be created, modified, and used in
various nodes
 Workflow Variable:
 Variables specific to a workflow and can
be used to pass data or values between
nodes within the same workflow
Dataflow in KNIME

 Dataflow defines how data is


processed and transformed as
it moves through the workflow.
 Keypoints about data flow in
KNIME include:
 Input and Output Ports
 Connectors
 Data Table
 Data Transformation
Workspace
 The folder where all current workflows
and preferences are saved for the next
KNIME Session
 The folder where all current workflows
and preferences are saved for the next
KNIME Session
 By default, the workspace folder is
“…\knime-workspace”
 Can be changed, by the changing the
path in the “Workspace Launcher”
window, before starting the KNIME
working session
Exercise-1 Create
Workspace

 Launch KNIME
 In the Workspace Launcher
Window, Click “Browse”
 Select the path for the new
workspace
 Create “Test Workspace”
KNIME Workflow
 KNIME Analytics platform does not work with
Scripts, but with graphical workflows
 Each step of the data analysis is implemented
and
 executed through a little box called “Node”
 Sequence of Nodes makes a workflow
 An analysis flow in graphics, having the
following process:
 Step 1: Read data

 Step 2: Clean Data

 Step 3: Filter Data

 Step 4: Train a model

 Workflows in KNIME are graphs connecting


nodes, or formally, direct acyclic graphs (DAG)
File Extensions: .knwf
and .knar files
 Knime workflows can be package and
exported in “.knwf” or “.knar” files
 A “.knwf” file contains only one workflow
 A “.knar” file contains a group of
workflows
 A double click opens the workflow inside
KNIME Analytics Platform
Workflow Configuration And
Execution
1. Node Configuration
2. Variable Assignment
3. Execution Control
4. Monitoring Execution
5. Workflow Results
Building a Basic Workflow
 Launching KNIME
 Creating New Workflow
 Go to the “File” menu
 Select “New Workflow”
– Creates a new canvas
to design workflow
Building a Basic Workflow
 Adding Nodes – Workflows in KNIME are
built by adding nodes, by dragging and
dropping onto the Canvas from the Node
Repository
 Connecting Nodes – Nodes are connected
using connectors. Output of one node is
connected to the input of the next node.
 Configuring Node – Double click or Right
Click on a node to open its configuring node.
 Running the Node – To execute the
workflow, Click the “Run” button on the
toolbar
Visual KNIME
Workflows
What is KNIME
Extensions
 KNIME Extensions Fast, flexible way to
extend your data science platform. Open
source extensions provide additional
functionalities such as access to and
processing of complex data types, as well as
the addition of advanced machine learning
algorithms.
Install Extensions
 From the top menu, select “File
Install KNIME Extensions
 ->Select
 -KNIME Math Expression
extension (JEP)
-KNIME External Tool Node
 -KNIME Report Designer
 ->Click “Next”
Exercise - 2

01 02

Installation of Lab extensions


additional
extensions
Exercise 3
 Install the following Extensions
 KNIME Database
 KNIME JavaScript Views
 KNIME Report Designer
Solution: Exercise 3

Select From the Top right corner options, select Menu “Install Extensions”

Search Search for Required Extensions

Click Click Next

Follow Follow the instructions


Data Access
 Files
 CSV, txt, Excel, Word, PDF
 XML, JSON, PMML
 Images, Texts, Networks

 Databases
 MySQL, PostgreSQL, Oracle
 Theobald
 Any JDBC (DB2, MS SQL Server)

 Other
 Twitter
 Google
 Sharepoint
Transformation
 Preprocessing
 Row, column
 Data Blending
 Join, concatenate, append
 Aggregation
 Grouping, Pivoting, Binning
 Feature Creation and Selection
Create a Node

 Drag and Drop the node from the “Node


Repository” panel into the workflow editor,
or
 Double-click the node in the “Node
Repository” panel To Connect a node with
existing nodes
 Select a node in the workflow and double
click a node in the repository
 Click the output port of the first node and
release the mouse at the input port of the
second node
View a Processed Node
If the execution is successful, Green light will be shown, and can view the
processed data

01 02 03 02

The data table


Select the last
Right-click the with the
option in the
node processed data
context menu will be
appeared
Create a workflow
Group
 If click on the Home Button:
 Click on the “Local Workspace”
 Select “Create Folder”
 In the “Create Folder” dialog:
 Enter the name of Workflow Group
Create a Workflow
 In the Local Space:
 Click on the “+” symbol
 In the “Create a new
workflow” dialog:
 Enter the name of
Workflow
 Click Create

 In Space Explorer
 Click on the black button
with three dots
 Click “Create Workflow”
Save a Workflow

 Saving the workflow saves the


workflow architecture, the node’s
configuration, and the data
produced at the output of each
node.
 Click the disk icon on the Top
Menu
 To save the copy of the currently
selected workflow, Click “Save
as..”
 To save ALL open workflows,
Click “Save ALL” stack of disks
icon.
Delete a Workflow
-Right Click the workflow in the “KNIME
Explorer”

-Select Delete

-Confirm Delete
Import/Export Workflow
Steps :

To import workflow, Right Click


anywhere local workspace in KNIME
Explorer

To export a workflow or workflow group,


first select the workflow (or group) to
export

Next, write the path to the destination


folder and the file name. While exporting
workflow group, select the elements you
want to export from inside the folder.
Exercise 4
 Create Empty Workflow
 Click “New” in the toolbar panel
 Right Clicking a folder in the local workspace in the KNIME Explorer
 Enter the name of the “WorkFlow”
 Browse the destination folder
 Click “Finish”
Data Importing &
Blending Node Operations

Read data from Understanding different data


the file structure & data type

Data
Exploration
Read Data
❑Steps :

❑Add a file reader node, by double clicking or by drag &


drop

❑In Configuration Dialog,

❑Click “Browse "to select the path of the file

❑In most of the cases, File Reader, automatically detects


the file structure

❑Ifnot, then enable/disable all required checkboxes


according to the data structure
Excel Reader
 Reads .xls and .xlsx file from Microsoft
Excel
 Supports reading from multiple sheets
Exercise - 5
STEPS -:

 Write final data to file in CSV format


 Create Workflow “Exercise 1”
 Read file “data1.txt”
 Change column name of “ranking” as
“marks”
 Change column name of “ranking” as
“marks”
 Remove Column “Class”
 Write final data to file in CSV format
Database Connector Node
• Creates a connection to an artibitary JDBC
database
• Select an appropriate driver and provide the
JDBC URL of the database
Data Preparation and Cleaning
Essential steps to ensure the data
is accurate and ready for analysis.

Some nodes for this task:


 Data Exploration
 Initial overview of the data
 Generates Summary
statistics and visualizations
Data Explorer
Install “KNIME JavaScript Views (Lab)”
Supports CSS styling
Data Cleaning
Nodes like “Missing Value” and “
Duplicate Row Filter” helps to
handle missing data and remove
duplicates
Missing Value – Helps to Handle
Missing Value in the data
Data Cleaning
Duplicate Row Filter – Identifies duplicate
rows can either remove all duplicate rows
from the input table and keep only unique
and chosen rows or mark the rows with
additional information about their
duplication status
Data Cleaning
Column Splitter
 Splits the columns of the
input table into two output
table
Data Transformation
Data can be transformed using Nodes like “Column Filter”,
“Math Formula” , “String Manipulation” etc
Data Transformation
Column Filter Node:-
 Allows columns to be filtered from the
input table while only the remaining
columns are passed to the output table
Data Transformation

 Math Formula
 Valuates a mathematical expression
based on the values in a row
 Computed results can be either
appended as new column or be used
to replace an input column
Data Transformation
 String Manipulation Node
 Manipulates strings
like search and
replace, capitalize or
remove leading and
trailing white spaces.
Data Aggregation
Nodes like "GroupBy" to perform aggregation
operations on the , such as calculating sums,
averages, or counts.
Groups the rows of a table by the unique
values in the selected group columns.
Data Imputation

Rule Engine:
Takes a list of user-defined rules and tries
to match them to each row in the input
table. If a rule matches, its outcome value
is added into a new column.
Exercise 6
 Extract details of persons born outside “United-States” in CSV Form
 Extract details of persons born outside “United-States” in CSV Form
 Read the file adult data
 Rename column “fnlwgt to Final Weight”
 Remove Column “Final Weight”
 Remove Row Containing “ United-States”
 Write the data to “CSV file” named “Born outside US”
DATA INTEGRATION AND
TRANSFORMATION
Involves combining data from various sources, reshaping it
and preparing it for analysis. Some nodes used for the
purpose are:
 Joining Data : Combine data from multiple tables based
on common keys or criteria
 Pivoting and Unpivoting : Help to reshape data from wide
to long format or vice versa
 Data Sampling : Used to select a subset of data for
analysis
 Data Normalization and Scaling : Normalize and Scale the
data to prepare it for machine learning algorithms.
 Text Mining and NLP: Supports text data processing and
Natural Language Processing
EXPORTING DATA FROM
KNIME

Data Export Node: To Export Data Visualisation: Nodes like Model Deployment: Machine Data Reports: Reports with
Data, nodes like “CSV Writer”, “Bar Chart”, “Pie learning models in KNIME can customised layouts and
“Excel Writer” or “Database Chart”,”Heatmap” etc are be exported for deployment export them in various
Writer” is used depending on used to create charts, plots. in production environment. formats such as PDF or HTML.
the nature of output
Exercise 7
 Objective: To do data Visualization
 Create a workflow “Exercise 7” under the workflow group “Exercises”
 Read Data Iris
 Name the columns : Sepal length, Sepal Width, Petal Length, Petal Width,
Class
 Classify the flower types to” Iris-setosa”, “Iris-versicolor” and “Iris-Virginca” to
Class 1, Class 2, Class 3 respectively
 Split the contents of “Class” into three columns
 Joining two columns
 Converting the contents of a column to upper case
 Replacing the word “Iris” to “Flower”
 Creating “Bins” based on Sepal Length
 Grouping the “Class” based on Bins
 Create Bar Chart and Scatter Plot
Connecting to Big Data
Sources
KNIME provides various connectors and integrations to connect to Big
data platforms:
Big Data Nodes Used Nodes
Platform
Hadoop HDFS Connector
Distributed File HDFS File Picker
System (HDFS)
Apache Spark Spark Reader- to load data
from Spark data frames
Spark SQL: Querying and
Manipulating Data
Connecting to Big Data
Sources
Big Data Platform Nodes Used Nodes
Apache Hive Hive Connector
Hive Table Selector

Big Databases like


HBase, Cassandra
Other Sources: Amazon S3
Amazon, S3, Google connector
Cloud Storage, Google Cloud
Azure Blob Storage Storage Connector
Handling Big Data in
KNIME
Data Sampling
KNIME's sampling capabilities to work with a representative subset of the data for initial
exploration and modeling
Distributed Computing
Processes databases in parallel, which improves processing by utilizing multiple
processing nodes.
Data Chunking
To prevent memory constraints, KNIME can process data in smaller, manageable
chunks.
Data Compression
Data Compression techniques are employed to reduce storage requirements and
optimize data transfer between nodes an across the network.
In – Database Processing
Data remines within the database for analysis , thus minimizes data movement and
enhances performance.
Big Data’s Data Sampling
workflow
Data Chunking Node with
workflow
Data Compression Node with
workflow

01 02 03 02
In-Database Processing
DISTRIBUTED DATA PROCESSING

Distributed data processing is a key capability in


KNIME for efficiently analysing and processing
Big Data:

Parallel Execution Data Partitioning Load Balancing Scalability

Automatically partitions data into smaller


Distribute Data processing tasks across multiple scale horizontally by adding more compute
chunks, which can be processed in parallel by Distributing tasks evenly across available
nodes or worker machines, thereby improves resources to handle larger datasets and more
different nodes or worker nodes, thus ensuring resources to maximise their utilisation
performance and reduces processing time. complex analyses
efficient resource utilisation.
BIG DATA ANALYTICS WITH KNIME
Data Exploration
Tools to gain insights into big data, generate summary statistics and create
visualisations
Data Preprocessing
Clean, transform and prepare Big Data for analysis
Machine Learning
Offers a wide range of machine learning algorithms which can be applied to big data,
including classification, regression, clustering and anomaly detection.
Models can be build for predictive analysis
Text and Image Analysis
Allows to extract valuable information from Unstructured Big Data Sources
Advanced Analytics
Conduct advance analytics like time series analysis, network analysis and geospatial
analysis on big data
BIG DATA ANALYTICS WITH
KNIME
Data Visualization and Reporting
Create visualizations and reports to communicate the findings effectively
Deployment and Integration
Deploy big data analytics workflow for production use
Export models and predictions for integration with other applications and data pipelines
Monitoring and Maintenance
Continuously monitor and maintain big data analytics workflows
Optimize workflows for performance
Keep KNIME and it extensions upto date
Collaboration and Sharing
Collaborate with team members and stakeholders by sharing knime workflows ,
reports and results
Use Knime Server for collaborative work and scheduled
execution
DATA TRANSFORMATION AND
MANIPULATION
Data Cleaning
Nodes like "Missing Value," "String Manipulation," and "Rule Engine" to handle missing
data, correct errors, and clean the dataset.
Data Transformation
For transforming data, use nodes such as "Column Filter,""Math Formula," and
"Pivoting," to reshape data and create new features.
Data Aggregation
Aggregate data using the "GroupBy" node to calculate summary statistics or create
aggregated datasets
Data Joining
Combine datasets using nodes like "Joiner" or "Concatenate" to merge data from
different sources or tables.
DATA TRANSFORMATION AND
MANIPULATION
Data Splitting
Split data into training and testing sets using nodes like "Partitioning" to facilitate model
evaluation.
Data Normalization and Scaling
Normalize or scale features to bring them to a common scale using nodes like
"Normalizer" or "Scorer
Text and String Processing
Nodes are available for Text Processing such as tokenization, stemming, and
sentiment analysis, making it suitable for text data manipulation
Data Transformation and
Manipulation related
workflow
STATISTICAL ANALYSIS IN
KNIME
Descritpive Analysis
Nodes like "Statistics" and "Data Explorer“ are used to compute
descriptive statistics, including measures of central tendency,
dispersion, and frequency distributions
Hypothesis Testing
Nodes like "Group Comparison" for comparing means, "Chi-Square
Test" for categorical data, and others.
Correlation Analysis
Determine relationships between variables using correlation
analysis nodes.
ANOVA and Regression Analysis
Perform analysis of variance (ANOVA) and regression analysis to
explore relationships between dependent and independent
variables.
Time Series Analysis
Analyse time series data with specialized nodes for
forecasting, trend analysis, and seasonal decomposition
Statistical Analysis Related
Workflow
MACHING LEARNING WORKFLOWS
Model Selection
Use nodes for model selection, such as "Variable Selection" and "Feature Selection," to identify
the most relevant features for modeling.
Model Training
Train machine learning models using nodes for various algorithms, including decision trees,
random forests, support vector machines, and neural networks
Model Evaluation
Evaluate models with nodes like "Scorer" to assess their performance using metrics like accuracy,
precision, recall, F1-score, and ROC curves
Cross Validation
Implement cross-validation techniques using nodes like "Cross-Validation Loop" to assess model
generalization
MACHING LEARNING WORKFLOWS

Ensemble Learning
Build ensemble models using nodes like "Ensemble Learner" to combine multiple
models for improved predictive performance
Hyperparameter Tuning
Optimize model hyperparameters with nodes like "Parameter Optimization" to
achieve the best model performance.
Machine Learning
Related Node and
Workflow
PREDICTIVE ANALYSIS AND
DATA MINING
Classification
Build classification models to predict categorical outcomes
KNIME supports various algorithms like logistic regression, decision trees, and k-nearest
neighbours
Regression
Perform regression analysis to predict numeric outcomes, using regression algorithms
like linear regression and support vector regression
Clustering
Use clustering algorithms to group similar data points together
KNIME offers k-means clustering, hierarchical clustering, and DBSCAN, among others
Anomaly Detection
Detect anomalies or outliers in data using specialized nodes like “Local outlier Factor”
PREDICTIVE ANALYSIS AND
DATA MINING
Association Rule Mining
Discover Patterns and associations in data using association rule mining
nodes
Time Series Forecasting
Forecast future values of time series data using dedicated nodes for time
series analysis and prediction.
Text Mining and NLP
Perform analyse and extract insights from unstructured text data
Geospatial Analysis
Perform geospatial analysis and visualization using nodes for geographic
information system (GIS) data
Classification Models to Predict
Categorical Outcomes Workflow
K – means Clustering Workflow
Geographic Information
System (GIS) data
workflow
DATA VISUALIZATION IN
KNIME
 Data visualization is a powerful way to communicate insights and patterns in data
 Visualization Nodes
 Nodes for creating visualizations, including scatter plots, bar charts, line charts,
heatmaps, and more
 Interactive Plots
 Supports interactive plots that allows to explore data dynamically, which
includes zooming, panning, tooltips and filtering
 Customization
 Customize the appearance of the visualizations, such as color schemes, labels,
and axis scaling, to ensure clarity and relevance
 Automated Visualization
 Automate the generation of visualizations using
data-driven approaches.
Interactive Plots Output
Automated Visualization
Workflow
CREATING INTERACTIVE
DASHBOARDS

Dashboard Drag and Drop Real Time Updates Parameterization


Components Design
KNIME offers a range Build dashboards Dashboards can be Create dashboards
of dashboard using a user-friendly, designed to update with parameterized
components like drag-and-drop in real-time as data inputs, allowing
tables, charts, filters, interface changes, enabling users to customize
and input widgets Arrange components users to see the views based on their
that can be and link them to latest information preferences or
combined to create control each other instantly specific analysis
interactive dynamically requirements
dashboards
Interactive Output Example
MODEL DEPLOYMENT AND
INTEGRATION
Deploying machine learning models and integrating
them into production systems is crucial for making
data-driven decisions

Export Models RESTful Web Services Batch Processing Database Integration

KNIME allows to export trained machine learning


Automate model deployment by integrating KNIME Integrate models with databases to perform in-
models in standard formats (e.g., PMML) for Deploy models as RESTful web services using KNIME Enables real-time predictions and integration with
workflows into batch processing pipelines to generate database scoring, making predictions directly within
deployment in various environments, such as web Server. other applications
predictions on new data regularly the database engine
applications or databases.
Model Deployment and Integration
Workflow
AUTOMATION AND
REPORTING IN KNIME
 Automation and reporting are essential for streamlining workflows and
sharing insights
 Workflow Automation

 Automate repetitive tasks and data processing steps using KNIME’s


workflow automation capabilities
 Schedule workflows to run at specific times or events

 Report Generation

 Create customizable reports in KNIME with text, tables, charts, and


visualizations.
 Can be generated automatically as part of a workflow or on-demand.
AUTOMATION AND
REPORTING IN KNIME
 Data Export
 Export data, results, and reports in various formats, including PDF,
Excel, CSV, and more, to share insights with stakeholders
 Integration with External Systems
 KNIME can integrate with external systems and databases to import
data, trigger workflows, and export results seamlessly
 Notifications
 Configure notifications and alerts to inform users or administrators
about workflow status, errors, or specific events.
Workflow Automation
Configure Notifications and
Alerts Template
TH A N K
YOU

You might also like