0% found this document useful (0 votes)
31 views1 page

Cheat Sheet Data Wrangling

This cheat sheet provides a comprehensive overview of data wrangling techniques using the KNIME Analytics Platform, including methods for accessing, combining, filtering, and writing data. It covers various nodes for reading different file types, connecting to databases, and applying transformations, as well as handling date and time formats. Additionally, it highlights functionalities for reshaping, aggregating, and cleaning data, along with formatting options for Excel sheets.

Uploaded by

HijodeDavid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views1 page

Cheat Sheet Data Wrangling

This cheat sheet provides a comprehensive overview of data wrangling techniques using the KNIME Analytics Platform, including methods for accessing, combining, filtering, and writing data. It covers various nodes for reading different file types, connecting to databases, and applying transformations, as well as handling date and time formats. Additionally, it highlights functionalities for reshaping, aggregating, and cleaning data, along with formatting options for Excel sheets.

Uploaded by

HijodeDavid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Cheat Sheet: Data Wrangling with KNIME Analytics Platform

ACCESS DATA COMBINE DATA FILTER DATA WRITE DATA DATE&TIME


Reads a CSV file from either your local Amazon S3 Concatenates the Row Filter
CSV Reader
Connects to Amazon S3 and points to a Filters rows in or out of the input table Writes the input table(s) to sheet(s) Parses the strings in the selected
file system or another connected file
Connector
rows of all input
working directory (with a UNIX-like syntax, e.g., Concatenate according to a filtering rule. The filtering rule
Excel Writer
in an Excel file (XLS or XLSX). Click String to Date&Time columns according to a date/time
system. Click the three dots in the lower tables by writing
/mybucket/myfolder/myfile). Allows can match a value in a selected column or the three dots in the lower left format and converts them into
left corner to add a dynamic connection them below each
downstream reader nodes to access data from numbers in a numerical range. corner to add a dynamic sheet input Date&Time cells. Four Date&Time
input port to connect to an external file other. This is
Amazon S3 as a file system. port to write multiple data tables forms are supported: only date, only
system, like Amazon S3, Azure Blob especially useful for Rule-based
tables with shared Row Filter Filters rows in or out according to a set of into multiple sheets. time, date&time, and date&time plus
Storage, etc.
column headers. rules, defined in its configuration window. time zone.
Excel Reader
Reads sheet(s) from one or more Common settings of Reader and Writer nodes Writes the input data table to a CSV
Rules are evaluated from top to bottom.
Excel files. One sheet from each Joins the columns
CSV Writer
file. Click the three dots in the lower Extracts rows where the time value in
File path: All Reader and Writer nodes require a file path. Using TRUE as the antecedent applies the Date&Time-based
Excel file. A loop can be used to read of the two input Full outer join Inner join left corner to add a dynamic Row Filter the selected column lies within a
The file path can be expressed as an absolute path in rule to all unmatched rows.
multiple sheets from one Excel file. tables based on one Reference connection input port to write to an given time window. The time window
the local file system, a relative path to a key location in Joiner
or multiple joining
Left Right Left Right Row Filter
Filters rows in or out from the top input external file system, like Amazon is specified either by a start and /or
Table Table Table Table
Table Reader Reads data from a .table file. The .table the current KNIME installation, or a path defined in an columns. Allows table according to matching values in the S3, Azure Blob Storage, etc. an end date or by a start date and a
Send to Tableau
files are organized using a KNIME external file system if such a connection is used. you to select Right outer join Left outer join selected column of the lower input table.
Server duration.
proprietary format, including the full file Multiple files: Reader nodes can read and concatenate between different Uploads the input table to a Tableau
Calculates the difference between two
structure, and are optimized for space and multiple files, according to a selected file extension or joiner modes and to Left Right Left Right server for reporting.
Table Table Table Table Column Filter Filters columns in or out from the input date&time objects e.g., from two
speed - providing maximum performance file name pattern. use multiple joining Date&Time
table according to a filtering rule. Difference selected columns, from a selected
with minimum configuration. columns.
SAP Reader Columns to be retained can be manually Send to Power BI
column and a fixed value, from a
(Theobald Software) Transformation tab: Reader nodes include a Cell Replacer
picked or selected according to their type, selected column and the current
Loads data from various SAP systems Transformation tab for renaming, filtering, re-ordering, Replaces the values in one column of the table at Uploads the input table to Microsoft
or based on a regex expression matching execution time, or from one cell and
(e.g. SAP S/4HANA, SAP BW, SAP R/3). and type changing of the columns. the top input port with values from a look up table Power BI for reporting.
their name. the cell in the previous row for a
provided at the bottom input port.
selected column.
Extract Date&Time
Fields
DB Connector
DB Reader
DB Row Filter Expands the input SQL query to include the DB Writer Inserts the data rows from the top Extracts selected time and date fields
Connects to any JDBC-compliant Executes the input SQL query on from a selected column of type
row filter criteria defined in the input port into a table in the database
DATABASES

database. The JDBC driver must be added the database and exports the Expands the input SQL query to include the join date&time and appends their values in
DB Joiner
configuration window. Grouping of multiple specified by the input connection port.
in the KNIME Preferences and then results into a KNIME data table. of two tables. It has a similar configuration new columns.
conditions with an AND or OR conjunctions If the database table does not exist it
selected in the node configuration window. window as the joiner node. No SQL coding
is also supported. No SQL coding required. will be created.
H2 Connector required. There are more DB nodes, all expanding DB Connection
DB Table Selector Creates a SQL query to access the database the input SQL query with additional SQL DB Query
Connects to an H2 database. Similar
CLEAN DATA
Table Writer
table selected in the configuration window. The Modifies the input SQL query using Writes the resulting rows from the
dedicated connector nodes connect to other instructions. Besides the SQL Query node, no DB custom SQL. The input SQL query is
table can be selected either via browsing the nodes require SQL coding. input SQL query into a new table inside
databases, such as MySQL or PostgreSQL. represented by the place holder
database metadata or via a custom SQL query. the database. Missing Value
#table#. Defines and applies a strategy to
replace missing values in the input
table - either globally on all columns, or
individually for each single column.
RESHAPE AND AGGREGATE DATA DYNAMIC PORT Duplicate
Row Filter Detects duplicate rows and applies the
Concatenate
selected operation, e.g. removes
Dynamic ports: Additional input ports can be duplicate rows. Duplicates are rows
Groups the rows of a table by the added by clicking the three dots in the bottom left that have the same value in all selected
unique values in selected Combine Filter Aggregate Write corner of a node. columns.
GroupBy
columns and calculates
Numeric Outliers
aggregation and statistical Detects and treats numerical outliers for
measures for the defined groups. each of the selected columns
Despite its simple name, it offers individually using the interquartile range
powerful functionality and has
many unsuspected usages.
FORMAT EXCEL SHEETS (IQR).

The Continental Nodes for KNIME extension allows you to automatically


Extends the aggregation format an existing Excel sheet. The key is an additional data table of the
functionality of the GroupBy same size as the original Excel sheet, where each cell contains one or
Group Pivot
node by creating an output table more comma separated tag values e.g., header, border, etc. Based on
Pivoting
with columns and rows for the these tags, the XLS Formatter nodes add new formatting instructions to
unique values in the selected the existing instructions, as available at the lower (optional) input port. E-Books: KNIME Advanced Luck covers
input columns. The unique advanced features & more. Practicing Data
values of the grouping columns Science is a collection of data science case
become rows and the unique DATA TYPES & CONVERSIONS XLS Control
Table Generator Transforms the input table to an XLS Control Table,
meaning it exchanges the column names to A, B, C, ... and studies from past projects. Both available at
values of the pivoting columns Collection Cell: Collection of multiple values of either
become columns.
S String: Sequence of characters, e.g. "This is a string" the row IDs to 1, 2, 3, ... It is the kickoff node to collect knime.com/knimepress
I Integer: Whole real valued number, e.g. -100 or 345 the same or different types e.g., can be a list of values formatting instructions for an Excel sheet and feeds all XLS
Splits values in the selected column D Double: Real valued number, e.g. -0.432 or 45.39 or a set of values. In a set each value occurs only once. formatter nodes. KNIME Blog: Engaging topics, challenges,
Maps the categorical values in the Cell Splitter
selected columns to integer values into two or more substrings, as Date&Time: A data format for date, time, date&time, or Document/Image: KNIME Analytics Platform XLS Background
Colorizer industry news, & knowledge nuggets at
Category to Number
and exports the mapping rules to defined by a delimiter match. A date&time plus time zone. supports many more data types like text documents, Adds background color and/or pattern fill formatting knime.com/blog
the model output port. The delimiter is a defined character, such B Boolean: Two possible values only, e.g. TRUE and FALSE images, fingerprints, etc. instructions to all cells with a specified tag in the XLS
Category to Number (Apply) and as a comma, space, or any other Number to String String to Number Control Table at the top input port.
character or character sequence. E-Learning Courses: Take our free online
Number to Category (Apply) nodes Converts the data type of the selected Converts the data type of the selected
apply the mapping rule in both 2 S columns from a number format, e.g. integer S 2 columns from string to either double or XLS Border
self-paced courses to learn about the
directions.
Ungroup
Ungroups a collection-type cell by or double, to string. integer.
Formatter
Adds border formatting instructions for a given range
different steps in a data
creating one row for each value in the specified by a tag in the XLS control table at the top science project (with exercises & solutions to
Creates one new column for each collection cell. Other columns from test your knowledge) at
input port.
value in the selected input the input table are left unaltered.
One to Many www.knime.com/knime-self-paced-courses
column. These values become the
column headers. Cells in the
CREATE COLUMNS XLS Cell Merger

newly created columns are set to Unpivoting Stacks the cells of the selected value
Adds formatting instructions to merge all cells with a
KNIME Hub: Browse and share workflows,
Math Formula String Manipulation
0 if the value is not present, columns into one column. The cells Implements a number of math operations Performs operations on string values in nodes, and components. Add ratings, or
specified tag in the XLS control table at the top input port.
otherwise 1. This type of encoding of the selected retained columns are across multiple input columns. The math columns, such as combining two or more comments to other workflows at
S
is called one-hot vector. appended to the corresponding operations can be applied to multiple columns strings together, extracting one or more hub.knime.com
Transpose output rows. with the Math Formula (Multi Column) node. substrings, trimming blank spaces, and so on. XLS Conditional
Formatter
Converts the rows to columns and Sorter
Rule Engine String Replacer Adds formatting instructions to color cell backgrounds KNIME Forum: Join our global community &
the columns to rows. Sorts the table in ascending or according to their numeric value for all cells specified by a
Applies a set of rules to each row of the engage in conversations at
descending order based on the Replaces values in a selected string column if tag in the XLS control table at the top
input table. Rules are applied from top to forum.knime.com
values of one or more columns. they match a defined pattern.
bottom. The first rule that matches is used.
XLS Formatter
Performs several transformations (apply) KNIME Server: For team-based collaboration,
Table Manipulator Value Counter
at once, such as renaming, filtering, Counter Generation Column Expressions
Combines the functionality of the Math automation, management, & deployment
Counts the number of occurrences of
re-ordering and type changing, on Creates a new column with a counter. The Formula, Rule Engine, and String Applies all formatting instructions to an existing Excel sheet. check out KNIME Server at
all values in a selected column from
the input columns. By adding start value and step size are defined in the Manipulation nodes. More than one
dynamic ports it can replace a
the input table. www.knime.com/knime-server
configuration window. expression can be defined to modify or add
concatenate node. multiple columns at the same time.

You might also like