4) CKM (Checking Knowledge Module):
This CKM we are using Data Quality Check for constraint related.
it will support PK,NOt null and Unique key constraint related data quality checking.
There are two types of Data Quality Check in CKM.
1)FLOW CONTROL
2)STATIC CONTROL
1)FLOW CONTROL:
This flow control we have to enable at IKM level and we need to use CKM knowledge module
and this CKM will do below processing.
Flow control will validate data at I$table before inserting into Target table.
If I$table having any duplicate or null records in I$table it will delete from
I$table and it will insert into E$table error table.
CKM Processing Steps: (Flow Control)
step1: Dropping E$table if already exists
step2: Creating SNP_CHECK_TAB table in staging area.
SNP_CHECK_TAB: This table we are using for common error table for all
interfaces. this table contain error message ,error number and interface name,
and schema name.
step3: Creating E$table same as I$table structure with additional columns like
error number,error description,error date and error type(PK or NOTNULL).
step4: Validating I$table if I$table having any duplicate or null rows it will select
and it will insert into E$table with error message,error type and error number
and it will delete from those rows in I$table.
Note: Flow control will validate data at I$table before inserting into Target table
and it wont insert invalid data into target table it will insert only E$table..
2)STATIC CONTROL:
This static control we are using for data quality check at IKM level using CKM but this static
will validate data at Target table level after inserting data into target table this Static control will
validate data in target table if target table having any duplicate or null rows it select and insert
into E$table but it won't delete from Target table.
Note: Static control wont validate at I$table level it will validate only at target table that is after
inserting data from I$ table to Target table it will validate in target table. this static control it
wont delete from target table. this is we are using for only information to know howmany rows
are duplicate or nulls in target table.
Based on our requirement we can use Any FLOW CONTROL or STATIC COTROL or BOTH.
this option we can enable at IKM level options
FLOW_CONTROL=TRUE/FALSE or STATIC_CONTROL=TRUE/FALSE.
How to reload error records from E$table to Target table?
we can reprocess errored records from E$table to target table using One Option at
IKM level i.e., RECYCLE_ERRORS=TRUE.
Note: This recycle errors will validate agin in E$table if E$table having valid data
it will insert into target table other wise it wont insert into target table.
This option we will user after modifieing or correcting data in E$table
IKM SQL Incremental Update
SYNC_JRN_DELETE:Check this option to synchronize journalized deletions. This option will take
effect only if one source table is journalized in your interface.
DELETE_TEMPORARY_OBJECTS: Set this option to NO if you wish to retain temporary objects
(tables, files and scripts) after integration. Useful for debugging.
DELETE_ALL: Check this option if you wish to delete all the rows of the target datastore.
IKM Oracle Incremental Upgrade
OPTIMIZER_HINT:Use this option to specify the hint to be used while loading the integration table.
ANALYZE_TARGET: Check this option if you wish to analyze the target table before loading data into
the integration table.
DIRECTION_STRATEGY Valid values are
- MINUS: The MINUS clause uses the Minus set operator when populating the flow table. This clause is
used to exclude identical records already existing in the target.
- NOT_EXISTS: The NOT EXISTS clause uses the Not Exists set operator when populating the flow
table. This clause is used to exclude identical records already existing in the target.
- POST_FLOW: all records from source are loaded into flow table. After that an update statement is used
to flag all rows in flow table, which identically exist in target.
- NONE: all records from source are loaded into flow table. All target records are updated even when
target records is identical to flow table record.
VALIDATE: This option generates an extra validation step during development. Validations performed
are:
- validation of KM option COMPATIBLE
- validation of RDBMS version of staging database
- validation of KM option DETECTION_STRATEGY
This option should be turned off for all production use in particular for high-frequency execution. There is
not added value in production, only processing overhead.
COMPITABLE: This option affects the use of PURGE key word and the way statistics are collected:
Values PURGE STATS
10 Yes DBMS_STATS
9 No DBMS_STATS
8 No ANALYZE
FLOW_TABLE_OPTIONS:Use this option to specify the attributes for the integration table at create time
and used for increasing performance.
This option is set by default to NOLOGGING (valid only from Oracle v8).
This option may be left empty.
IKM SQL Slowly Changing dimension
Validate target columns This option generates an extra validation step.This step check that at least one
column is marked as SCD_ADD and atleast another one marked as SCD_UPD.
CDC (Change Data Capture):
CDC we are using to capture only changed data from source tables. For implementing CDC
we need to use JKM(Journalization Knowledge Module).
This CDC will work for only Database Tables not for files, excel and xml files.
Note: If we want to implement CDC on source table ,source table should satify below
requirement.
1)Source table should contain Primary key
2)Source table should allow to created DML Triggers on that table.
What is DML PLSQL Triggers: Whenver event occuers (Delete,insert or update) on
source table this trigger will fire and it will insert modified rows into our stagning tables.
Enable CDC at Table level in ODI:
step1: There are two types Journalizatioin
1)Simple
2)Consistent set
JKM simple we will use individual tables without data dependency.
JKM Consistent set we will use for set of tables with having data dependency. like
parent and child tables.
there are two JKM's will be available if simple we can go for JKM ORacle Simple
if it is Consistent we can go for JKM Oracle Consistent.
Importing JKM Oracle Simple.
step2: Enable CDC at Table level in source Model
Goto=>Designer=>Source Model=>Right click on EMP Table=>Change Data Capture=>
select Add To CDC
step3: After adding CDC then we need to start journal.
before starting Journal we need to enable JKM at Model level.
Open Source Model=>goto Journalizing Tab=>Select simple=>Select JKM Oracle Simple.
Right click on EMP=>Change Data Capture=>Start Journal.
In this Journalization process it will create below objects in Source schem for EMP table.
1) T$EMP DML trigger on Source emp table
2)J$EMP table for maintain changed data
3)JV$DEMP view it will create on both EMP and J$EMP tables.
step4: Enabling CDC at your Interface.
Goto=> Interface=>Source mapping datastore area=>Click on source emp table=>
goto Property inspector=>Select Journalization Data only.
After enable this journalization flag at interface it will select data from JV$DEMP view.
this JV$DEMP having is selecting data from both EMP and J$EMP tables.
this view will return only modified data.
After finishing your interface it will delete data from J$table.
Note: Whenever changes happing on source table this DML trigger will fire and if it is update or
delete or inserting on source that data trigger will insert into J$EMP table
JNR_FLAG='I' or 'U' or 'D' and JNR_DATE=modified date and PK Column vlaue EMPNO.
while running our interface it will select data from JV$DVIEW, this view having only
modified data. after loading data into target table it will delete from J$EMP table.
2) Agent:
Agent is a run time component it will interaction between client and servers like ODI studio to
source and target databases interactions.
There are two types of agents.
1)Standalone agent
2)J2EE Agent
1)Standalone agent:
This standalone agent will work odi standalone installations. This agent job includes below list.
1)Establish connection between client to servers and transfer your reqested source
code to source or target databases.
2)Creating session while execution your program or interface
3)Creating complete log for your program session
4)Closing session after finishing your program
2)J2EE Agent:
This agent will be available if you are using middleware like weblogic or websphere.
if you are using j2ee installations we can use this agent and we will below listed features.
1)Load balancing
2)High Availability
3)Multipl Agents
4)and same features as Standalone agent.
Physical Architecutre we will Physical Agents like Standalone or j2ee agents and
we will Data Server for source and target technologies for earch data server we need to create
Physical schemas. In Data Server We need create atleast one physical schema