PowerCenter 8.6 Beginner's Guide
PowerCenter 8.6 Beginner's Guide
Beginner’s Tutorial
   Informatica PowerCenter 8.6
CHAPTER 1
Product Overview
      This chapter includes the following topics:
      ♦   Introduction, 1
      ♦   PowerCenter Domain, 4
      ♦   PowerCenter Repository, 5
      ♦   Administration Console, 6
      ♦   Domain Configuration, 7
      ♦   PowerCenter Client, 8
      ♦   Repository Service, 13
      ♦   Integration Service, 14
      ♦   Web Services Hub, 14
      ♦   Data Analyzer, 14
      ♦   Metadata Manager, 15
      ♦   Reference Table Manager, 16
Introduction
      PowerCenter provides an environment that allows you to load data into a centralized location, such as a data
      warehouse or operational data store (ODS). You can extract data from multiple sources, transform the data
      according to business logic you build in the client application, and load the transformed data into file and
      relational targets.
      PowerCenter also provides the ability to view and analyze business information and browse and analyze
      metadata from disparate metadata repositories.
      PowerCenter includes the following components:
      ♦   PowerCenter domain. The Power Center domain is the primary unit for management and administration
          within PowerCenter. The Service Manager runs on a PowerCenter domain. The Service Manager supports
          the domain and the application services. Application services represent server-based functionality and
          include the Repository Service, Integration Service, Web Services Hub, and SAP BW Service. For more
          information, see “PowerCenter Domain” on page 4.
      ♦   PowerCenter repository. The PowerCenter repository resides in a relational database. The repository
          database tables contain the instructions required to extract, transform, and load data. For more information,
          see “PowerCenter Repository” on page 5.
                                                                                                                      1
           ♦   Administration Console. The Administration Console is a web application that you use to administer the
               PowerCenter domain and PowerCenter security. For more information, see “Administration Console” on
               page 6.
           ♦   Domain configuration. The domain configuration is a set of relational database tables that stores the
               configuration information for the domain. The Service Manager on the master gateway node manages the
               domain configuration. The domain configuration is accessible to all gateway nodes in the domain. For more
               information, see “Domain Configuration” on page 7.
           ♦   PowerCenter Client. The PowerCenter Client is an application used to define sources and targets, build
               mappings and mapplets with the transformation logic, and create workflows to run the mapping logic. The
               PowerCenter Client connects to the repository through the Repository Service to modify repository
               metadata. It connects to the Integration Service to start workflows. For more information, see “PowerCenter
               Client” on page 8.
           ♦   Repository Service. The Repository Service accepts requests from the PowerCenter Client to create and
               modify repository metadata and accepts requests from the Integration Service for metadata when a workflow
               runs. For more information, see “Repository Service” on page 13.
           ♦   Integration Service. The Integration Service extracts data from sources and loads data to targets. For more
               information, see “Integration Service” on page 14.
           ♦   Web Services Hub. Web Services Hub is a gateway that exposes PowerCenter functionality to external
               clients through web services. For more information, see “Web Services Hub” on page 14.
           ♦   SAP BW Service. The SAP BW Service extracts data from and loads data to SAP NetWeaver BI. If you use
               PowerExchange for SAP NetWeaver BI, you must create and enable an SAP BW Service in the PowerCenter
               domain. For more information, see the PowerCenter Administrator Guide and the PowerExchange for SAP
               NetWeaver User Guide.
           ♦   Reporting Service. The Reporting Service runs the Data Analyzer web application. Data Analyzer provides a
               framework for creating and running custom reports and dashboards. You can use Data Analyzer to run the
               metadata reports provided with PowerCenter, including the PowerCenter Repository Reports and Data
               Profiling Reports. Data Analyzer stores the data source schemas and report metadata in the Data Analyzer
               repository. For more information, see “Data Analyzer” on page 14.
           ♦   Metadata Manager Service. The Metadata Manager Service runs the Metadata Manager web application.
               You can use Metadata Manager to browse and analyze metadata from disparate metadata repositories.
               Metadata Manager helps you understand and manage how information and processes are derived, how they
               are related, and how they are used. Metadata Manager stores information about the metadata to be analyzed
               in the Metadata Manager repository. For more information, see “Metadata Manager” on page 15.
           ♦   Reference Table Manager Service. The Reference Table Manager Service runs the Reference Table Manager
               web application. Use Reference Table Manager to manage reference data such as valid, default, and cross-
               reference values. Reference Table Manager stores reference tables metadata and the users and connection
               information in the Reference Table Manager repository. The reference tables are stored in a staging area. For
               more information, see “Reference Table Manager” on page 16.
Domain Configuration
Sources
   PowerCenter accesses the following sources:
   ♦    Relational. Oracle, Sybase ASE, Informix, IBM DB2, Microsoft SQL Server, and Teradata.
   ♦    File. Fixed and delimited flat file, COBOL file, XML file, and web log.
   ♦    Application. You can purchase additional PowerExchange products to access business sources such as
        Hyperion Essbase, WebSphere MQ, IBM DB2 OLAP Server, JMS, Microsoft Message Queue, PeopleSoft,
        SAP NetWeaver, SAS, Siebel, TIBCO, and webMethods.
   ♦    Mainframe. You can purchase PowerExchange to access source data from mainframe databases such as
        Adabas, Datacom, IBM DB2 OS/390, IBM DB2 OS/400, IDMS, IDMS-X, IMS, and VSAM.
   ♦    Other. Microsoft Excel, Microsoft Access, and external web services.
Targets
   PowerCenter can load data into the following targets:
   ♦    Relational. Oracle, Sybase ASE, Sybase IQ, Informix, IBM DB2, Microsoft SQL Server, and Teradata.
   ♦    File. Fixed and delimited flat file and XML.
   ♦    Application. You can purchase additional PowerExchange products to load data into business sources such
        as Hyperion Essbase, WebSphere MQ, IBM DB2 OLAP Server, JMS, Microsoft Message Queue, PeopleSoft
        EPM, SAP NetWeaver, SAP NetWeaver BI, SAS, Siebel, TIBCO, and webMethods.
   ♦    Mainframe. You can purchase PowerExchange to load data into mainframe databases such as IBM DB2 for
        z/OS, IMS, and VSAM.
   ♦    Other. Microsoft Access and external web services.
   You can load data into targets using ODBC or native drivers, FTP, or external loaders.
                                                                                                 Introduction   3
PowerCenter Domain
           PowerCenter has a service-oriented architecture that provides the ability to scale services and share resources
           across multiple machines. PowerCenter provides the PowerCenter domain to support the administration of the
           PowerCenter services. A domain is the primary unit for management and administration of services in
           PowerCenter.
           A domain contains the following components:
           ♦    One or more nodes. A node is the logical representation of a machine in a domain. A domain may contain
                more than one node. The node that hosts the domain is the master gateway for the domain. You can add
                other machines as nodes in the domain and configure the nodes to run application services such as the
                Integration Service or Repository Service. All service requests from other nodes in the domain go through
                the master gateway.
                A nodes runs service processes, which is the runtime representation of an application service running on a
                node.
           ♦    Service Manager. The Service Manager is built in to the domain to support the domain and the application
                services. The Service Manager runs on each node in the domain. The Service Manager starts and runs the
                application services on a machine.
           ♦    Application services. A group of services that represent PowerCenter server-based functionality. The
                application services that run on each node in the domain depend on the way you configure the node and the
                application service.
           You use the PowerCenter Administration Console to manage the domain.
           If you have the high availability option, you can scale services and eliminate single points of failure for services.
           The Service Manager and application services can continue running despite temporary network or hardware
           failures. High availability includes resilience, failover, and recovery for services and tasks in a domain.
           Figure 1-2 shows a sample domain with three nodes:
           This domain has a master gateway on Node 1. Node 2 runs an Integration Service, and Node 3 runs the
           Repository Service.
           RELATED TOPICS:
           ♦ “Administration Console” on page 6
    Service Manager
           The Service Manager is built in to the domain and supports the domain and the application services. The
           Service Manager performs the following functions:
           ♦    Alerts. Provides notifications about domain and service events.
           ♦    Authentication. Authenticates user requests from the Administration Console, PowerCenter Client,
                Metadata Manager, and Data Analyzer.
           ♦    Authorization. Authorizes user requests for domain objects. Requests can come from the Administration
                Console or from infacmd.
           ♦    Domain configuration. Manages domain configuration metadata.
           ♦    Node configuration. Manages node configuration metadata.
  Application Services
     When you install PowerCenter Services, the installation program installs the following application services:
     ♦   Repository Service. Manages connections to the PowerCenter repository. For more information, see
         “Repository Service” on page 13.
     ♦   Integration Service. Runs sessions and workflows. For more information, see “Integration Service” on
         page 14.
     ♦   Web Services Hub. Exposes PowerCenter functionality to external clients through web services. For more
         information, see “Web Services Hub” on page 14.
     ♦   SAP BW Service. Listens for RFC requests from SAP NetWeaver BI and initiates workflows to extract from
         or load to SAP NetWeaver BI.
     ♦   Reporting Service. Runs the Data Analyzer application. For more information, see “Data Analyzer” on
         page 14.
     ♦   Metadata Manager Service. Runs the Metadata Manager application. For more information, see “Metadata
         Manager” on page 15.
     ♦   Reference Table Manager Service. Runs the Reference Table Manager application. For more information,
         see “Reference Table Manager” on page 16.
PowerCenter Repository
     The PowerCenter repository resides in a relational database. The repository stores information required to
     extract, transform, and load data. It also stores administrative information such as permissions and privileges for
     users and groups that have access to the repository. PowerCenter applications access the PowerCenter
     repository through the Repository Service.
     You administer the repository through the PowerCenter Administration Console and command line programs.
     You can develop global and local repositories to share metadata:
     ♦   Global repository. The global repository is the hub of the repository domain. Use the global repository to
         store common objects that multiple developers can use through shortcuts. These objects may include
         operational or application source definitions, reusable transformations, mapplets, and mappings.
     ♦   Local repositories. A local repository is any repository within the domain that is not the global repository.
         Use local repositories for development. From a local repository, you can create shortcuts to objects in shared
         folders in the global repository. These objects include source definitions, common dimensions and lookups,
         and enterprise standard transformations. You can also create copies of objects in non-shared folders.
     PowerCenter supports versioned repositories. A versioned repository can store multiple versions of an object.
     PowerCenter version control allows you to efficiently develop, test, and deploy metadata into production.
     You can view repository metadata in the Repository Manager. Informatica Metadata Exchange (MX) provides a
     set of relational views that allow easy SQL access to the PowerCenter metadata repository.
     You can also create a Reporting Service in the Administration Console and run the PowerCenter Repository
     Reports to view repository metadata.
                                                                                             PowerCenter Repository    5
Administration Console
           The Administration Console is a web application that you use to administer the PowerCenter domain and
           PowerCenter security.
    Domain Page
           You administer the PowerCenter domain on the Domain page of the Administration Console. Domain objects
           include services, nodes, and licenses.
           You can complete the following tasks in the Domain page:
           ♦   Manage application services. Manage all application services in the domain, such as the Integration Service
               and Repository Service.
           ♦   Configure nodes. Configure node properties, such as the backup directory and resources. You can also shut
               down and restart nodes.
           ♦   Manage domain objects. Create and manage objects such as services, nodes, licenses, and folders. Folders
               allow you to organize domain objects and manage security by setting permissions for domain objects.
           ♦   View and edit domain object properties. View and edit properties for all objects in the domain, including
               the domain object.
           ♦   View log events. Use the Log Viewer to view domain, Integration Service, SAP BW Service, Web Services
               Hub, and Repository Service log events.
           Other domain management tasks include applying licenses and managing grids and resources.
           Figure 1-3 shows the Domain page:
    Security Page
           You administer PowerCenter security on the Security page of the Administration Console. You manage users
           and groups that can log in to the following PowerCenter applications:
           ♦   Administration Console
           ♦   PowerCenter Client
           ♦   Metadata Manager
           ♦   Data Analyzer
           You can complete the following tasks in the Security page:
           ♦   Manage native users and groups. Create, edit, and delete native users and groups.
Domain Configuration
     Configuration information for a PowerCenter domain is stored in a set of relational database tables managed by
     the Service manager and accessible to all gateway nodes in the domain. The domain configuration database
     stores the following types of information about the domain:
     ♦   Domain configuration. Domain metadata such as host names and port numbers of nodes in the domain.
         The domain configuration database also stores information on the master gateway node and all other nodes
         in the domain.
     ♦   Usage. Includes CPU usage for each application service and the number of Repository Services running in
         the domain.
     ♦   Users and groups. Information on the native and LDAP users and the relationships between users and
         groups.
     ♦   Privileges and roles. Information on the privileges and roles assigned to users and groups in the domain.
     Each time you make a change to the domain, the Service Manager updates the domain configuration database.
     For example, when you add a node to the domain, the Service Manager adds the node information to the
     domain configuration. All gateway nodes connect to the domain configuration database to retrieve domain
     information and update the domain configuration.
                                                                                                  Domain Configuration   7
PowerCenter Client
           The PowerCenter Client application consists of the following tools that you use to manage the repository,
           design mappings, mapplets, and create sessions to load the data:
           ♦   Designer. Use the Designer to create mappings that contain transformation instructions for the Integration
               Service. For more information about the Designer, see “PowerCenter Designer” on page 8.
           ♦   Mapping Architect for Visio. Use the Mapping Architect for Visio to create mapping templates that can be
               used to generate multiple mappings. For more information, see “Mapping Architect for Visio” on page 9.
           ♦   Repository Manager. Use the Repository Manager to assign permissions to users and groups and manage
               folders. For more information about the Repository Manager, see “Repository Manager” on page 10.
           ♦   Workflow Manager. Use the Workflow Manager to create, schedule, and run workflows. A workflow is a set
               of instructions that describes how and when to run tasks related to extracting, transforming, and loading
               data. For more information about the Workflow Manager, see “Workflow Manager” on page 11.
           ♦   Workflow Monitor. Use the Workflow Monitor to monitor scheduled and running workflows for each
               Integration Service. For more information about the Workflow Monitor, see “Workflow Monitor” on
               page 12.
           Install the client application on a Microsoft Windows machine.
    PowerCenter Designer
           The Designer has the following tools that you use to analyze sources, design target schemas, and build source-
           to-target mappings:
           ♦   Source Analyzer. Import or create source definitions.
           ♦   Target Designer. Import or create target definitions.
           ♦   Transformation Developer. Develop transformations to use in mappings. You can also develop user-defined
               functions to use in expressions.
           ♦   Mapplet Designer. Create sets of transformations to use in mappings.
           ♦   Mapping Designer. Create mappings that the Integration Service uses to extract, transform, and load data.
           You can display the following windows in the Designer:
           ♦   Navigator. Connect to repositories and open folders within the Navigator. You can also copy objects and
               create shortcuts within the Navigator.
           ♦   Workspace. Open different tools in this window to create and edit repository objects, such as sources,
               targets, mapplets, transformations, and mappings.
           ♦   Output. View details about tasks you perform, such as saving your work or validating a mapping.
                                                                                          PowerCenter Client    9
            Figure 1-6 shows the Mapping Architect for Visio window:
     Repository Manager
            Use the Repository Manager to administer repositories. You can navigate through multiple folders and
            repositories, and complete the following tasks:
            ♦   Manage user and group permissions. Assign and revoke folder and global object permissions.
            ♦   Perform folder functions. Create, edit, copy, and delete folders. Work you perform in the Designer and
                Workflow Manager is stored in folders. If you want to share metadata, you can configure a folder to be
                shared.
            ♦   View metadata. Analyze sources, targets, mappings, and shortcut dependencies, search by keyword, and view
                the properties of repository objects.
            The Repository Manager can display the following windows:
            ♦   Navigator. Displays all objects that you create in the Repository Manager, the Designer, and the Workflow
                Manager. It is organized first by repository and by folder.
            ♦   Main. Provides properties of the object selected in the Navigator. The columns in this window change
                depending on the object selected in the Navigator.
            ♦   Output. Provides the output of tasks executed within the Repository Manager.
   Repository Objects
   You create repository objects using the Designer and Workflow Manager client tools. You can view the
   following objects in the Navigator window of the Repository Manager:
   ♦   Source definitions. Definitions of database objects such as tables, views, synonyms, or files that provide
       source data.
   ♦   Target definitions. Definitions of database objects or files that contain the target data.
   ♦   Mappings. A set of source and target definitions along with transformations containing business logic that
       you build into the transformation. These are the instructions that the Integration Service uses to transform
       and move data.
   ♦   Reusable transformations. Transformations that you use in multiple mappings.
   ♦   Mapplets. A set of transformations that you use in multiple mappings.
   ♦   Sessions and workflows. Sessions and workflows store information about how and when the Integration
       Service moves data. A workflow is a set of instructions that describes how and when to run tasks related to
       extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each
       session corresponds to a single mapping.
Workflow Manager
   In the Workflow Manager, you define a set of instructions to execute tasks such as sessions, emails, and shell
   commands. This set of instructions is called a workflow.
   The Workflow Manager has the following tools to help you develop a workflow:
   ♦   Task Developer. Create tasks you want to accomplish in the workflow.
   ♦   Worklet Designer. Create a worklet in the Worklet Designer. A worklet is an object that groups a set of
       tasks. A worklet is similar to a workflow, but without scheduling information. You can nest worklets inside a
       workflow.
   ♦   Workflow Designer. Create a workflow by connecting tasks with links in the Workflow Designer. You can
       also create tasks in the Workflow Designer as you develop the workflow.
                                                                                                PowerCenter Client   11
            When you create a workflow in the Workflow Designer, you add tasks to the workflow. The Workflow
            Manager includes tasks, such as the Session task, the Command task, and the Email task so you can design a
            workflow. The Session task is based on a mapping you build in the Designer.
            You then connect tasks with links to specify the order of execution for the tasks you created. Use conditional
            links and workflow variables to create branches in the workflow.
            When the workflow start time arrives, the Integration Service retrieves the metadata from the repository to
            execute the tasks in the workflow. You can monitor the workflow status in the Workflow Monitor.
            Figure 1-8 shows the Workflow Manager windows:
     Workflow Monitor
            You can monitor workflows and tasks in the Workflow Monitor. You can view details about a workflow or task
            in Gantt Chart view or Task view. You can run, stop, abort, and resume workflows from the Workflow
            Monitor. You can view sessions and workflow log events in the Workflow Monitor Log Viewer.
            The Workflow Monitor displays workflows that have run at least once. The Workflow Monitor continuously
            receives information from the Integration Service and Repository Service. It also fetches information from the
            repository to display historic information.
            The Workflow Monitor consists of the following windows:
            ♦    Navigator window. Displays monitored repositories, servers, and repositories objects.
            ♦    Output window. Displays messages from the Integration Service and Repository Service.
            ♦    Time window. Displays progress of workflow runs.
            ♦    Gantt Chart view. Displays details about workflow runs in chronological format.
            ♦    Task view. Displays details about workflow runs in a report format.
Repository Service
     The Repository Service manages connections to the PowerCenter repository from repository clients. A
     repository client is any PowerCenter component that connects to the repository. The Repository Service is a
     separate, multi-threaded process that retrieves, inserts, and updates metadata in the repository database tables.
     The Repository Service ensures the consistency of metadata in the repository.
     The Repository Service accepts connection requests from the following PowerCenter components:
     ♦   PowerCenter Client. Use the Designer and Workflow Manager to create and store mapping metadata and
         connection object information in the repository. Use the Workflow Monitor to retrieve workflow run status
         information and session logs written by the Integration Service. Use the Repository Manager to organize and
         secure metadata by creating folders and assigning permissions to users and groups.
     ♦   Command line programs. Use command line programs to perform repository metadata administration tasks
         and service-related functions.
     ♦   Integration Service. When you start the Integration Service, it connects to the repository to schedule
         workflows. When you run a workflow, the Integration Service retrieves workflow task and mapping
         metadata from the repository. The Integration Service writes workflow status to the repository.
     ♦   Web Services Hub. When you start the Web Services Hub, it connects to the repository to access web-
         enabled workflows. The Web Services Hub retrieves workflow task and mapping metadata from the
         repository and writes workflow status to the repository.
     ♦   SAP BW Service. Listens for RFC requests from SAP NetWeaver BI and initiates workflows to extract from
         or load to SAP NetWeaver BI.
     You install the Repository Service when you install PowerCenter Services. After you install the PowerCenter
     Services, you can use the Administration Console to manage the Repository Service.
                                                                                                Repository Service   13
Integration Service
            The Integration Service reads workflow information from the repository. The Integration Service connects to
            the repository through the Repository Service to fetch metadata from the repository.
            A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming,
            and loading data. The Integration Service runs workflow tasks. A session is a type of workflow task. A session is
            a set of instructions that describes how to move data from sources to targets using a mapping.
            A session extracts data from the mapping sources and stores the data in memory while it applies the
            transformation rules that you configure in the mapping. The Integration Service loads the transformed data
            into the mapping targets.
            Other workflow tasks include commands, decisions, timers, pre-session SQL commands, post-session SQL
            commands, and email notification.
            The Integration Service can combine data from different platforms and source types. For example, you can join
            data from a flat file and an Oracle source. The Integration Service can also load data to different platforms and
            target types.
            You install the Integration Service when you install PowerCenter Services. After you install the PowerCenter
            Services, you can use the Administration Console to manage the Integration Service.
Data Analyzer
            Data Analyzer is a PowerCenter web application that provides a framework to extract, filter, format, and
            analyze data stored in a data warehouse, operational data store, or other data storage models. The Reporting
            Service in the PowerCenter domain runs the Data Analyzer application. You can create a Reporting Service in
            the PowerCenter Administration Console.
            Use Data Analyzer to design, develop, and deploy reports and set up dashboards and alerts. You also use Data
            Analyzer to run PowerCenter Repository Reports, Metadata Manager Reports, Data Profiling Reports. Data
            Analyzer can access information from databases, web services, or XML documents. You can also set up reports
            to analyze real-time data from message streams.
                                                         Dashboards/
                                                         Reports
Metadata Manager
     Informatica Metadata Manager is a PowerCenter web application to browse, analyze, and manage metadata
     from disparate metadata repositories. Metadata Manager helps you understand how information and processes
     are derived, how they are related, and how they are used.
     Metadata Manager extracts metadata from application, business intelligence, data integration, data modeling,
     and relational metadata sources. Metadata Manager uses PowerCenter workflows to extract metadata from
     metadata sources and load it into a centralized metadata warehouse called the Metadata Manager warehouse.
     You can use Metadata Manager to browse and search metadata objects, trace data lineage, analyze metadata
     usage, and perform data profiling on the metadata in the Metadata Manager warehouse. You can use Data
     Analyzer to generate reports on the metadata in the Metadata Manager warehouse.
     The Metadata Manager Service in the PowerCenter domain runs the Metadata Manager application. Create a
     Metadata Manager Service in the PowerCenter Administration Console to configure and run the Metadata
     Manager application.
                                                                                                Metadata Manager    15
     Metadata Manager Components
            The Metadata Manager web application includes the following components:
            ♦    Metadata Manager Service. An application service in a PowerCenter domain that runs the Metadata
                 Manager application and manages connections between the Metadata Manager components. You create and
                 configure the Metadata Manager Service in the PowerCenter Administration Console.
            ♦    Metadata Manager application. Manages the metadata in the Metadata Manager warehouse. You use the
                 Metadata Manager application to create and load resources in Metadata Manager. After you use Metadata
                 Manager to load metadata for a resource, you can use the Metadata Manager application to browse and
                 analyze metadata for the resource. You can also use the Metadata Manager application to create custom
                 models and manage security on the metadata in the Metadata Manager warehouse.
            ♦    Metadata Manager Agent. Runs within the Metadata Manager application or on a separate machine. It is
                 used by Metadata Exchanges to extract metadata from metadata sources and convert it to IME interface-
                 based format.
            ♦    Metadata Manager repository. A centralized location in a relational database that stores metadata from
                 disparate metadata sources. It also stores Metadata Manager metadata and the packaged and custom models
                 for each metadata source type.
            ♦    PowerCenter repository. Stores the PowerCenter workflows that extract source metadata from IME-based
                 files and load it into the Metadata Manager warehouse.
            ♦    Integration Service. Runs the workflows that extract the metadata from IME-based files and load it into the
                 Metadata Manager warehouse.
            ♦    Repository Service. Manage connections to the PowerCenter repository that stores the workflows that
                 extract metadata from IME interface-based files.
            ♦    Custom Metadata Configurator. Creates custom Metadata Exchanges to extract metadata from metadata
                 sources for which Metadata Manager does not package a Metadata Exchange.
            Figure 1-11 shows the Metadata Manager components:
                                                                                    Custom Metadata
                                     Integration Service                            Configurator